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Abstract 


Speech  recognition  algorithms  were  analyzed  using 
normal  and  G-stressed  speech  as  an  input.  Speech  samples 
were  recorded  in  centrifuge  tests  at  the  Air  Force  Medical 
Research  Lab,  Wr ight-Patterson  AFB,  Ohio.  All  speech  was 
recorded  using  the  MBU-12/P  face  mask.  The  algorithms 
studied  are  phoneme-based  feature  extractors  which  feed  a 
recognition  algorithm  based  on  fuzzy  set  theory.  Three 
feature  extraction  algorithm  options  were  analyzed.  One 
option  used  a  phoneme  length  of  40  ms  and  the  other  options 
used  a  length  of  8  ms.  The  recognition  results  for  all 
three  options  using  normal  speech  are  above  90%,  but  the 
40ms  phoneme  length  give  higher  raw  scores.  For  G-stressed 
speech  the  40  ms  phoneme  length  scored  greater  than  90% 
while  the  8ms  phoneme  length  options  scored  less  than  60%. 


I. 


INTRODUCTION 


The  cockpit  tasks  for  a  fighter  pilot  have  increased 
significantly  in  the  past  35  years.  Present  technology 
offers  the  pilot  a  multitude  of  system  functions  and 
displays,  which  have  increased  the  pilot  workload 
considerably.  A  speech  recognition  system  can  be  used  to 
decrease  the  pilot  workload.  Speech  input  would  also  be 
valuable  on  low-level  missions  or  when  flying  wing,  because 
speech  input  would  enable  the  pilot  to  keep  his  eyes  out  of 
the  cockpit.  However,  most  speech  recognition  systems 
degrade  considerably  when  exposed  to  G-stress  speech 
associated  with  high  performance  aircraft. 

This  research  project  will  use  G-speech  to  analyze  a 
feature  extraction  and  speech  recognition  system.  Solutions 
to  the  G-speech  recognition  problems  associated  with  cockpit 
noise  and  stress  will  be  helpful  for  speech  recognition 
systems  used  in  other  applications,  both  military  and 
civilian . 

BACKGROUND 

In  1981  at  the  Air  Force  institute  of  Technology  (AFIT), 
Carl  Seelandt  developed  an  extensive  software  package  to 
extract  features  from  speech  (Ref  1).  Seelandt's  primary 
work  used  five-vector  phoneme  templates  to  extract  features 
from  input  speech.  Seelandt's  work  showed  promising  results 
because  of  the  ability  of  his  feature  extraction  system  to 
resynthesize  speech  from  independent  speakers  (Ref  1).  The 
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resynthesized  speech  was  recognizable  from  many  different 
speakers.  However,  a  preliminary  experiment  performed  by 
this  study  had  recognition  results  of  less  than  20%  using 
resynthesis  techniques.  These  results  are  in  Appendix  G  and 
were  based  on  resynthesized  speech  using  Seelandt's  phonemes 
(extracted  from  Seelandt's  speech),  with  an  independent 
input  speaker  wearing  a  helmet  and  under  G-stress.  1  poor 
recognition  rate  is  attributed  to  wearing  a  mask  ^.her 
phonemes  used  in  this  research  were  extracted  from  £  •''cts 
wearing  masks). 

Further  work  in  the  feature  extraction  area  was  done  by 
Martin  in  1982  at  AFIT  (Ref  2).  Martin's  programs  use  the 
array  processor  and  can  be  used  for  feature  extraction. 
Software  was  developed  during  the  course  of  this  research  so 
Martin's  programs  could  be  used  as  part  of  a  feature 
extraction  system.  Both  Seelandt's  and  Martin's  feature 
extraction  systems  created  input  files  for  a  word 
recognition  algorithm. 

The  word  recognition  algorithm  studied  in  this  project 
is  a  new  algorithm  developed  by  Montgomery  in  1982  also  at 
AFIT  (Ref  3).  His  algorithm  is  unique  because  it  is  based 
on  fuzzy  set  theory.  Montgomery  demonstrated  better  than 
50%  recognition  results  for  independent  speakers  using  input 
data  based  on  a  feature  extraction  system  developed  by 
Seelandt  (Ref  3:78). 


PROBLEM 


Three  major  items  of  the  feature  extraction  system  were 
investigated.  The  first  item  is  the  length  of  phonemes 
contained  in  a  phoneme  template.  Five-vector  length 
phonemes  (40  ms)  and  one- vector  length  phonemes  (8  ms)  were 
studied.  The  phonemes  are  compared  to  input  speech  to  find 
the  distances  between  each  phoneme  and  the  speech. 

The  distance  rule  will  also  be  studied.  Seelandt's 
feature  extraction  system  uses  a  distance  rule  called 
Minkowski  one  (Ml),  The  Ml  distance  was  chosen  for  its 
computational  simplicity.  This  thesis  project  will  study 
the  difference  between  Ml  and  Minkowski  two  (M2)  distance. 

The  third  item  studied  is  the  averaging  of  phonemes  in 
the  phoneme  template.  The  averaging  of  phonemes  is  an 
option  with  the  feature  extraction  system  software  and  has 
not  been  studied  before.  The  averaging  of  phonemes  will 
hopefully  reduce  the  number  of  phoneme  needed  and  make  the 
phoneme  template  set  more  robust. 

Approach 

Five-vector  and  one-vector  phoneme  templates  were 
developed  from  15  speech  files.  The  phoneme  templates 
consisted  of  70  sounds  extracted  from  prerecorded  speech 
of  a  vocabulary  "zero"  to  11  nine" ,  " CCIP" ,  "enter", 
"frequency11,  "step",  and  "threat".  These  two  templates  were 
used  by  the  feature  extraction  system  to  create  feature 
extraction  files  from  90  speech  files.  The  feature 
extraction  files  were  entered  into  the  speech  recognition 
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program  to  conduct  four  experiments.  The  experiments  are: 


EXPERIMENT 

PHONEME  LENGTH 

DISTANCE  RULE 

1 

5 -VECTOR 

Ml 

2 

1 -VECTOR 

Ml 

3 

1 -VECTOR 

M2 

4 

1-VECTOR 

M2 

(fourth  experiment  is  same  as  third  but  with  different 
fuzzy  variables  and  word  representation  used  in 
recognition  program) 

The  speech  files  used  in  the  experiments  consisted  of 
normal  (lg)  speech  and  G-stressed  speech.  All  speech  was 
recorded  with  the  subjects  wearing  a  mask.  The  phoneme 
templates  were  extracted  from  normal  (lg)  speech  with 
subjects  wearing  a  mask. 

Sequence  of  Presentation 

Chapter  II  covers  data  acquisition  and  how  the  speech 
files  were  prepared  before  the  feature  extraction.  The 
feature  extraction  system  is  discussed  in  Chapter  III  with 
emphasis  on  the  phoneme  templates.  Chapter  IV  discusses  the 
word  recognition  algorithm  and  how  phoneme  representations 
are  picked.  Results  are  in  Chapter  V,  with  conclusions  and 
recommendations  in  Chapters  VI  and  VII. 

Speech  files  and  computer  programs  are  in  Appendices  A 
thru  F.  Appendix  G  contains  other  experiments  which 
include: 

1.  Resynthesized  speech  experiment 

2.  Independent  speaker  experiment 

3.  128-point  DFT  recognition  experiment 
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II .  Data  Acquisition 


The  original  data  tapes  for  this  research  were 
generated  by  the  Aerospace  Medical  Research  Laboratory 
(AMRL),  Wright-Patterson  AFB,  Ohio.  Volunteers  were 
subjected  to  different  G-levels  in  the  human  centrifuge.  A 
standard  vocabulary  was  used  which  consisted  of  the  words: 
zero,  one,  two,  three,  four,  five,  six,  seven,  eight,  nine, 
CCIP,  enter,  frequency,  step,  and  threat. 

Subjects  in  the  human  centrifuge  were  seated  in  an  F-16 
seat,  at  a  30°  bank  angle  with  shoulder  pads.  Subjects  were 
wearing  an  HGU  48P  helmet  with  an  MBU  12P  mask  connected  to 
a  CRU  66A  Bendix  regulator.  In  addition  to  repeating  the 
standard  vocabulary,  the  subjects  were  simultaneously  doing 
a  pitch  axis  tracking  task  with  a  side  arm  force  stick. 

The  normal  and  G-stress  speech  utterances  were  recorded 
by  AMRL  on  a  small  portable  Nagra  SN  tape  recorder  operating 
at  3.75  IPS.  These  original  recordings  were  transferred  by 
AMRL  to  quarter  inch  tape  using  a  Nagra  IV-D  at  7.5  IPS. 

Ana  log  to  Digita 1  Conversion 

The  audio  equipment  was  connected  as  shown  in  Figure  1. 
The  sampling  rate  of  the  input  speech  waveform  for  analog  to 
digital  (A/D)  conversion  was  8  kHz.  The  data  was  low-pass 
filtered  at  3.6  kHz  with  -48db/octave  slope  above  the  3.6 
kHz  break  frequency  which  satisfies  Nyquist's  sampling 
criteria.  (For  more  information  on  the  analog  to  digital 
interface  for  the  Nova  2  computer  see  reference  4). 


5 


PREAMP  SECTION 


ROCKLAND 

LOW/HI  PASS  FILTER 


D/A  A/D 


CROMEMCO 


ATTENUATOR 
10  DB 


CROWN 

AMPLIFIER 


NOVA  2 
COMPUTER 
"  PROGRAM  ' 
AUDIOHIST 


SPEAKERS 


ECLIPSE  S/250 
COMPUTER 


Figure  1*  Equipment  for  Analog  to  Digital  Conversion 


The  computer  program  used  to  digitize  the  recorded 
speech  was  "Audiohist"  written  by  Paul  Finkes  and  J.  Hunter 
(Ref  5/  6).  The  speech  utterances  are  digitized  using 
"Audiohist"  which  produces  88  disk  blocks  (one  file).  The 
disk  blocks  are  256  16-bit  integers,  and  the  88  disk  blocks 
enable  2.816  seconds  of  data  to  be  stored  in  each  speech 
file.  Program  "Audiohist"  also  enables  the  user  to  play 


back  the  digitized  speech.  Digitized  speech  played  back 
from  "Audiohist"  has  no  audible  difference  from  the  analog 
input.  The  "Audiohist"  program  was  also  used  to  edit  each 
file  from  88  blocks  to  a  smaller  block  size  to  save  file 
space.  The  files  ranged  in  size  from  15  to  40  blocks  after 
editing.  In  some  files  breathing  noises  were  kept  in  the 
file  along  with  the  word  to  be  analyzed.  These  noises  will 
be  used  in  the  analysis  of  the  feature  extraction  algorithm 
and  the  recognition  algorithm.  Another  computer  program 
developed  by  Allen  can  also  be  used  for  digitizing  data  with 
similar  results  to  "Audiohist"  (Ref  7). 

The  Cromemco  A/D  converter  has  a  voltage  range  of  +5 
volts  and  a  12  bit  word.  Therefore  +5  volts  would  be  equal 
to  2047  stored  in  the  computer. 

The  digitized  speech  files  are  stored  as  integers  on 
disk  and  backed  up  on  magnetic  tape.  The  files  used  in  this 
research  project  are  listed  in  Appendix  A.  The  file  names 


or  for  G-stress  utterances: 

X  X  #  X  X  .  SP 

same  as  above 
same  as  above 

specifies  the  G-force  in  the  Y  axis 
specifies  the  G-force  in  the  Z  axis 
same  as  above 

Example:  HCBE.SP  is  a  speech  file  (SP)  from  the  volunteer 
Capt  Henwood  ( H ) ,  and  is  the  second  (B),  control  (C) 
utterance  and  the  word  spoken  was  "enter"  (E).  H30F.SP  is 
the  word  "frequency"  taken  from  Capt  Henwood  at  three  Gs. 

The  digitized  speech  files  created  and  edited  using 
"Audiohist"  are  processed  by  feature  extraction  algorithms. 
The  resulting  data  from  the  feature  extraction  algorithms  is 
then  processed  for  recognition.  This  will  be  described  in 
more  detail  in  the  next  two  chapters. 


III.  Feature  Extraction 


The  feature  extraction  system  used  in  this  research 
was  based  on  work  done  by  Carl  Seelandt  (Ref  1).  Seelandt's 
feature  extraction  system  was  based  on  finding  distances 
between  a  phoneme  template  and  speech  utterances.  Three 
different  items  were  studied  using  the  procedures  and 
programs  developed  by  Seelandt.  The  three  items  studied 
are:  optimum  phoneme  length/  which  distance  rule  to  use, 
and  analysis  of  phoneme  averaging.  Phoneme  lengths  of  five- 
vectors  (40ms)  and  one- vector  (8  ms)  will  be  studied.  The 
distance  rule  used  by  Seelandt  was  Minkowski  one  distance 
(Ml)  picked  because  of  Ml's  computational  advantage  over 
other  rules.  Minkowski  two  distance  will  be  used  in  the 
feature  extraction  process  and  compared  to  Ml  distance. 
Seelandt  also  developed  software  to  create  phoneme  templates 
which  included  the  ability  to  average  multiple  source  files 
into  individual  phonemes.  The  use  of  averaged  phoneme 
templates  used  for  feature  extraction  was  also  studied.  The 
sequence  to  follow  for  feature  extraction  is  depicted  in 
Figure  2. 

Discrete  Fourier  Transform 

A  discrete  Fourier  transform  (DFT)  was  used  to  convert 
digitized  speech  files  into  frequency  component  files.  The 
DFT  process  accepts  N  input  samples  from  the  digitized 
speech  files,  where  N  is  some  power  of  two.  The  DFT  size 
used  for  this  study  set  N  equal  to  64,  which  results  in  32 
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components  from  dc  through  3750  hz  in  125  hz  increments. 
Thus,  frequency  component  files  contain  vectors  of  32 
components  and  each  vector  corresponds  to  8  ms  of  original 
speech. 

This  research  effort  uses  two  different  sets  of 
programs  to  extract  features  from  speech.  The  first  set  of 
feature  extraction  programs,  developed  by  Karl  Seelandt, 
used  a  64  point  DFT,  normalized  each  vector,  and  used  a  6  db 
per  octave  preemphasis  with  a  corner  frequency  of  500  hz 
(Ref  1).  The  next  set  of  feature  extraction  programs  were 
developed  by  Martin  (Ref  2).  Martin's  programs  used  a 
preemphasis  above  500  hz  of  10  db  per  octave,  a  deemphasis 
of  10  db  per  octave  below  300  hz,  replaced  the  dc  component 
of  each  vector  with  the  vector  energy,  and  was  used  for 
sing  1 e- vec to r  phoneme  analysis.  The  energy  in  each  vector 
was  added  to  the  feature  extraction  output.  Both  sets  of 
programs  used  a  Hamming  window  as  recommended  by  Finkes  (Ref 
5:22).  The  Hamming  window  also  produced  a  cleaner 


spectrogram  than  the  rectangular  window  did.  The 
spectrograms  were  used  as  one  tool  to  pick  phoneme 
templates. 

Phoneme  Templates 

Extracting  features  from  speech  frequency  files  was 
accomplished  by  comparing  phoneme  templates  to  the  speech 
files.  Producing  phoneme  templates  involved  using 
procedures  and  software  tools  developed  by  Seelandt  (Ref  1) 
and  extended  to  Martin’s  programs  with  new  software  tools 


developed  by  this  author.  Phoneme  templates  of  five-vector 
and  one-vector  lengths  were  developed.  In  addition 
averaging  different  vectors  into  each  phoneme  of  the  phoneme 
set  were  investigated  and  used  for  the  first  time.  In  the 
work  done  by  Seelandt,  he  did  not  have  time  to  investigate 
the  use  of  averaged  phoneme  templates. 

F_ive  - Vector  Phonemes .  To  make  the  five-vector 
phonemes,  techniques  similar  to  those  used  by  Seelandt  were 
used.  First  a  spectrogram  was  made  using  program  TEKTALK, 
developed  by  Seelandt  (Ref  1)  and  modified  by  Fletcher  (Ref 
8).  Program  TEKTALK  presents  a  spectrogram  of  the  input 
speech  on  a  Tektronix  Scope.  A  segment  of  speech  represented 
by  the  spectrogram  could  be  heard  by  placement  of  the 
Tektronix's  cursors  on  the  spectrogram.  Vectors  were  picked 
to  be  in  the  phoneme  template  by  listening  to  speech 
segments  and  looking  at  spectrograms  generated  from  the 
speech  files.  This  may  seem  rather  ad  hoc,  but  this  method 
attempts  to  pick  out  the  consistent  components  of  speech 
to  be  used  as  a  phoneme  template.  Figures  3  through  17  are 
spectrograms  of  the  fifteen  source  files  where  all  the 
phoneme  templates  were  extracted  from  for  this  research. 
Table  I  lists,  for  five-vector  and  one-vector  phoneme 
templates,  the  origin  of  each  phoneme.  In  addition.  Table  I 
shows  vectors  that  were  used  for  speech  synthesis.  A  speech 
synthesis  experiment  is  discussed  in  Appendix  G. 

Seelandt  tried  to  pick  his  phonemes  to  represent 
distinguishable  sounds  found  in  his  speech  utterances.  In 
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Table  I 

(All 

Phoneme  Template  Source 
phonemes  from  files  with  HCP  prefix) 

PHONEME 

NUMBER 

WORD 

5-VECTOR 

PHONEME 

START 

VECTOR 

SINGLE  VECTOR 
START  /  TIMES 
VECTOR/ MODI FI ED 

VECTOR  FOR 
SPEECH 
SYNTHESIS 

noise 

zero 

zero 

zero 

zero 

zero 

zero 

one 

one 

one 

one 

one 

one 

seven 

nine 

enter 

two 

two 

two 

two 

three 

three 

three 

three 

three 

four 

four 

four 

four 

four 

five 

five 

five 

five 

six 

six 

six 

six 

six 

seven 

seven 

seven 

seven 

seven 

seven 


10,13 

18 

28 

35 
45 
52 
10 
16 
23 

29 
34 
40 
65 
64 

30 
10 

17.22 
27,30 

35.39 
11 

16,20 

27 

34.40 
47 

7 

15,20,25, 

36 
43 
50 
12 

22,27,32, 

42.47 
54,  57 

13.18.22 
27,32 
38,43 
60,62 
69,74,79 
11,16,21 
29 

35.40 

45.47 
52,55 
61 


10/7 

18/5 

28/5 

35/5 

45/5 

52/5 

10/3 

16/7 

23/5 

29/5 

35/6 

39/5 

62/5 

64/5 

30/4 

10/5 

17/8 

25/8 

35/9 

9/3 

17/5 

27/4 

38/7 

47/4 

7/5 

15/19 

36/4 

44/4 

51/4 

11/3 

21/23 

44/8 

54/6 

13/13 

27/9 

39/8 

61/5 

69/17 

11/16 

29/5 

35/10 

45/6 

53/7 

62/3 


Table  I  (Continued) 

Phoneme  Template  Source 
(All  phonemes  from  files  with  HCP  prefix) 


PHONEME 

number 

WORD 

5-VECTOR 

PHONEME 

START 

VECTOR 

43 

eight 

12,17,23 

43 

eight 

28,33 

44 

eight 

54,59 

45 

nine 

22,27,32 

46 

nine 

39,44 

47 

nine 

51,56 

48 

CCIP 

44,49 

49 

CCIP 

55,60 

50 

CCIP 

71,76 

51 

CCIP 

82 

52 

CCIP 

98 

53 

CCIP 

104, 109 

53 

CCIP 

114,119 

54 

enter 

19,24 

55 

enter 

39 

56 

enter 

48,53,58 

57 

frequency 

19 

58 

frequency 

25 

59 

frequency 

32,  34 

60 

frequency 

46 

61 

frequency 

50 

62 

frequency 

55 

63 

frequency 

69 

64 

frequency 

78,83 

65 

step 

15,20 

66 

step 

39,41 

67 

step 

46,49 

68 

threat 

6, 10 

69 

threat 

27,32 

70 

threat 

54,59 

71 

noise 

72 

noise 

— 

73 

noise 

74 

noise 

75 

noise 

76 

noise 

— 

77 

noise 

78 

noise 

79 

noise 

80 

noise 

81 

noise 

SINGLE  VECTOR  VECTOR  FOR 

START  /  TIMES  SPEECH 

vector/ modified  synthesis 


15/19 

20 

54/9 

60 

22/15 

24 

40/7 

43 

51/10 

55 

41/10 

49 

55/10 

60 

71/11 

73 

82/4 

83 

98/3 

100 

104/19 

110 

21/8 

22 

40/3 

41 

48/13 

48 

19/3 

20 

25/4 

26 

32/7 

34 

47/3 

47 

51/4 

52 

56/3 

57 

68/7 

70 

78/10 

83 

15/11 

23 

39/7 

42 

46/8 

50 

6/6 

8 

28/7 

30 

55/7 

57 

[zero]  59/5 

[one]  48/5 

[one]  54/10 

_ 

[two]  1/6 

. 

[three]  57/6 

_ 

[six]  55/5 

_ 

[eight]  43/10 

[nine]  1/10 

[nine]  90/7 

[step]  63/10 

_ 

[threat]  40/14 

— 
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this  research,  phonemes  were  picked  in  a  similar  manner; 
however,  distinct  sounds  were  not  the  only  input  for 
picking  a  phoneme.  Each  speech  file  represented  by  the 
spectrograms  in  Figures  3  through  17  were  looked  at  and 
listened  to  using  program  TEKTALK.  Phonemes  were  picked  not 
only  according  to  sound,  but  according  to  how  similar  the 
spectrogram  vectors  were.  An  attempt  was  made  to  pick  the 
vectors  that  had  a  consistent  spectrographic  pattern.  The 
spectrogram  for  the  word  "eight"  shows  a  very  strong 
spectrographic  pattern  for  the  "a"  sound  in  "eight". 
Because  of  that  strong  pattern  for  the  "a"  sound  it  was 
decided  to  average  as  many  vectors  as  possible,  without 
changing  the  overall  pattern  of  the  sound  "a",  into  one 
phoneme.  Initial  results  showed  that  averaging  did  work  and 
extracted  features  as  v/ell  as  multiple  phonemes  for  the  "a" 
sound.  When  three  phonemes  were  used  for  the  "a"  sound  in 
"eight"  all  three  phoneme  sounds  would  come  up  as  the  top 
choice  in  the  feature  extraction  system.  When  one  phoneme 
was  used  for  the  "a"  sound  in  "eight"  it  replaced  all  three 
sounds  as  the  feature  extraction  choice.  Thus,  initial 
results  show  that  an  average  phoneme  could  replace  multiple 
phoneme  sounds,  therefore  reducing  the  phonemes  needed  for 
each  word.  Phoneme  templates  were  created  interactively  by 
bringing  in  speech  that  consisted  of  frequency  components. 
The  frequency  components,  which  represent  the  original 
speech,  were  used  as  templates  by  picking  out  vectors  to  be 
phonemes.  The  program  would  take  the  beginning  vecto,  of 
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Figure  13.  Spectrogram  of  "CCIP" 


►V- 

t‘,% 


26 


r. 


It 

a 

i 


V  C  V-  V.  *  V.  V  ^  > 

Figure  15.  Spectrogram  of  "FREQUENCY" 


J  *  J  J  ) 


t. 


28 


FILE:  HCP8-  00  SENTENCE  SPOKEN:  S 

DATE:  1!  33  1*B3  TI»C  S  44  43 
FIRST  TIHE  SLICE  -  1  LAST  TI*>  SLICE 


the  phoneme  to  be  added,  find  it  in  the  speech,  and  add  it 
to  the  template.  The  above  procedure  is  done  for  several 
phoneme  sounds  in  each  word  until  the  template  is  filled  out 
from  the  various  speech  files.  After  the  template  is 
formed,  it  is  available  to  the  program  that  finds  the 
distance  between  the  input  speech  and  the  phoneme  template. 

One- V  ector  Phonemes .  Templates  of  single-vector 
phonemes  were  studied  using  new  programs  developed  by 
Martin.  Martin's  programs  used  the  array  processor  and 
extended  memory  available  in  the  Eclipse  s/250  computer. 
His  programs  used  one-vector  phoneme  templates  and  found  the 
distance  between  these  templates  and  input  speech  files. 
In  addition,  Martin's  programs  can  be  changed  easily  to 
study  different  size  DFT's,  change  preemphasis,  or 
deemphasis  as  needed  for  speech  study. 

Three  programs  were  developed  to  interface  and  use 
Martin's  programs  in  the  word  recognition  cycle.  Single¬ 
vector  phonemes  were  developed  using  the  same  files  as  used 
for  five-vector  phonemes  (Figures  3  thru  17).  Phonemes  were 
picked  to  be  as  close  to  the  five-vector  phoneme  as 
possible.  In  Table  I  you  can  see  that  the  one-vector 
phonemes  came  from  almost  the  same  vectors.  In  some  cases, 
one  or  two  vectors  were  left  off  from  the  one-vector  phoneme 
in  order  to  have  the  spectrogram  characteristics  more 
uniform.  Figure  19,  is  a  spectrogram  of  the  single- vector 
phonemes.  The  s ing 1 e- vector  phoneme  template  includes 
twelve  noise  vectors  which  were  treated  as  one  phoneme  by 
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the  feature  extraction  program.  The  five-vector  phonemes 
had  only  two  noise  vectors. 


Thresholding 

Thresholding  was  used  to  find  the  beginning  and  ending 
of  words  for  isolated  word  recognition.  Seelandt's 
programs,  which  were  used  to  study  the  five-vector  phoneme 
templates,  used  a  simple  thresholding  to  find  the  beginning 
and  end  of  words.  When  a  threshold  is  set  in  Seelandt's 
program,  any  sound  below  that  threshold  is  muted  and 
represented  by  no  frequency  components  in  the  output.  Thus, 
when  any  distance  routine  was  used,  the  phoneme  that  was 
most  like  the  thresholded  material  was  phoneme  number  one. 
Phoneme  number  one  consisted  of  very  few  frequency 
components  (the  least  amount  in  the  phoneme  template). 
Thus,  if  the  thresholding  worked  properly,  it  looked  as  if 
Seelandt's  program  picked  noise  from  the  file  very  well. 
This  simple  threshold  technique  was  effective  in  laboratory 
speech,  but  not  effective  for  the  speech  used  in  this  study. 

For  studying  one-vector  phoneme  templates  there  is  no 
thresholding  incorporated  to  set  the  frequency  components 
to  zero.  Thresholding  for  single- vector  work  was  done  after 
the  distance  routine  and  was  used  to  find  the  beginning  and 
end  of  words  by  using  the  energy  in  each  vector.  However,  it 
was  found  that  a  simple  thresholding  technique  did  not  do 
very  well  on  speech  which  was  recorded  using  a  mask,  because 
of  breathing  and  exhaling.  The  threshold  would  sometimes 
set  the  beginning  and  end  of  words  erroneously,  when 


breathing  or  exhaling  exceeded  the  threshold  level  set. 


Therefore,  words  could  be  represented  by  a  feature 
extraction  string  much  longer  than  the  actual  word  should 
have  been.  In  order  to  minimize  this  problem,  a  simple 
algorithm  was  devised  to  ignore  short  transients  above  the 
set  threshold.  The  new  thresholding  algorithm  would  ignore 
transients  shorter  than  five  vectors  (40ms).  This  algorithm 
worked  better  than  simple  thresholding  and  can  be  found  in 
program  TOPS  which  prepares  feature  extraction  files  for  the 
recognition  routine  program  LEARN. 

Distance  Rule 

After  the  phoneme  template  is  formed  the  input  speech 
can  be  entered  into  the  program  which  finds  the  distance 
between  each  vector  of  speech  and  every  phoneme  in  the 
template.  The  distance  routines  are  seen  in  Table  II  below. 


The  two  different  distance  rules  in  Table  II  were 
studied  by  this  research  and  are  based  on  two  cases  of  the 
Minkowski  distance  rule.  Seelandt's  programs,  using  the 
five-vector  phonemes,  used  the  Ml  distance  rule.  The  Ml 
and  M2  distance  rules  were  used  for  single- vector  phonemes. 
The  results  of  the  single  phonemes  will  be  used  to  compare 
the  Ml  and  M2  distances. 

The  distance  routines  were  used  to  find  the  distance 
between  a  phoneme  template  and  the  speech  files.  The 
distance  was  found  between  the  frequency  components  (32  x  5 
array),  of  the  five-vector  phoneme  template,  and  the 
equivalent  number  of  components  in  the  speech.  The  phoneme 
represented  40  ms  of  speech  and  distances  were  calculated  at 
each  8  ms  interval  on  the  speech  input.  The  single- vector 
phoneme  templates  represented  8  ms  of  speech.  The  distance 
was  calculated  for  each  vector  of  speech  (32  frequency 
components)  against  the  same  number  of  components  in  the 
phoneme . 

Five  Top  Choices 

The  end  product  of  the  feature  extraction  system  is 
five  top  choices  of  phonemes  for  each  8  millisecond  of  input 
speech.  In  addition,  each  of  these  five  choices  will  be 
scaled  from  100  to  zero.  The  top  choice  will  be  100  and  the 
last  phoneme  choice  (not  the  fifth  choice)  would  correspond 
to  0.  Since  only  the  top  five  choices  will  be  seen  the 
scale  usually  shows  100  for  the  top  phoneme  and  80  to  90  or 
even  50  for  the  fifth  phoneme  choice.  In  addition,  there  is 
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a  scale  factor  for  each  vector  (8  ms)  of  speech.  The  scale 
factor  is  calculated  as  follows: 


SCALE  FACTOR 


Vector  minimum  phoneme  distance 
Maximum  {minimum  phoneme  distance  in  file} 


The  five  top  choices  use  the  following  formula  to  scale  each 
of  the  five  choices  in  a  vector: 

SCALE  =  VECTOR  MAXIMUM  DISTANCE  -  CHOICE  DISTANCE 

VECTOR  MAXIMUM  DISTANCE  -  VECTOR  MINIMUM  DISTANCE 

This  scale  was  used  because  program  LEARN  uses  this  scale  in 
formation  for  word  scoring  (Ref  3). 

The  programs  used  by  Seelandt  to  process  each  speech 
file  in  the  feature  extraction  system  consisted  of  programs 
called  TRYDIST5  and  LISTER4.  TRYDIST5  and  LISTER4  were 
modified  to  output  the  data  as  listed  in  Figure  22. 
TRYDIST5  and  LISTER4  were  modified  by  Montgomery,  and 
renamed  PHDIST  and  CH0ICE5  respectively.  To  use  Martin's 
programs,  the  program  T0P5  was  developed  to  link  his  program 
to  the  recognition  algorithm  program  developed  by 
Montgomery.  In  addition  to  listing  the  top  five  choices  and 
the  scale  factor,  program  T0P5  also  lists  the  energy  for 
each  vector  in  the  speech  file.  Figure  20  shows  output  from 
the  program  T0P5. 

After  speech  inputs  are  processed  by  the  feature 
extraction  system,  the  output  from  the  feature  extraction 
system  is  used  for  the  recognition  program  described  in  the 
next  chapter. 
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Figure  20.  Feature  Extraction  Output 


Word  Recognition  Algorithm 


The  word  recognition  algorithm  used  m  this  research 
was  developed  by  Gerard  Montgomery  and  uses  fuzzy  set  theory 
for  isolated  word  recognition  (Ref  3).  To  use  the  word 
recognition  program,  files  formatted  in  the  exact  form  of 
Figure  20  or  Figure  22  are  stored  in  files  to  be  used  by 
program  LEARN.  Program  LEARN  prompts  the  user  for  a  phoneme 
representation  when  each  new  word  is  encountered. 


Phoneme  Representation 

Each  word  to  be  recognized  by  program  LEARN  must  have  a 
phoneme  representation.  This  representation  can  be  set 
according  to  the  features  extracted  earlier.  Phoneme  #2 
through  phoneme  #7  represent  the  word  "zero"  in  the  phoneme 
template.  The  logical  phoneme  representation  for  "zero" 
would  be  2-3-4-5-6-7.  However,  when  program  LEARN  scores  a 
file  against  a  phoneme  representation  it  may  delete  some  of 
the  phonemes  which  are  listed  in  the  phoneme  representation. 
These  deletions  can  degrade  the  algorithm's  preformance. 
Therefore,  a  phoneme  representation  was  picked  to  minimize 
the  number  of  deletions  found  when  statistics  are  gathered 
by  program  LEARN. 

The  first  step  in  picking  a  phoneme  representation  is 
to  pick  an  initial  phoneme  representation  for  each  word. 
These  representations  were  used  to  gather  statistics  on  45 
input  files.  The  45  inputs  files  consisted  of  the 
vocabulary  "zero"  through  "nine",  "CCIP",  "enter". 


"frequency”,  "step",  and  "threat".  After  statistics  are 
gathered,  the  phoneme  representation  can  be  changed  by 
picking  a  representation  that  minimizes  the  number  of 
deletions.  Next  program  LEARN  is  used  to  recognize  the  same 
45  training  files.  After  the  recognition  results  were 
obtained  one  could  see  how  well  the  phoneme  representation 
did.  An  example  of  minimizing  the  deletions  follows:  for 
the  word  "zero"  a  phoneme  representation  is  picked  to 
include  2-3-4-5-6-7  and  it  is  found  phoneme  5,  6  and  7  were 
deleted  three  times,  it  is  possible  to  eliminate  just  the 
phoneme  5  and  have  zero  deletions  for  the  new  phoneme 
representation  2-3-4-6-7.  Phoneme  representations  were 
developed  for  all  fifteen  utterances  based  on  the  trial  and 
error  techniques  discussed  above.  The  trial  and  error 
techniques  were  only  used  for  the  training  files  and  only 
for  the  five-vector  phoneme  templates.  The  single- vector 
phoneme  template  would  use  the  same  phoneme  representations 
used  for  five-vector.  The  phoneme  representation  given  in 
Table  C  was  picked  after  several  trials  which  consisted  of 
changing  the  phoneme  representation,  collecting  statistics, 
and  running  recognition  results  for  the  training  set. 
Recognition  results  of  100%  on  the  training  file  were 
obtained,  then  word  recognition  results  were  run  on  a  new 
set  of  45  speech  files  ( non- training  files)  to  find  the 
actual  performance  of  the  recognition  program. 
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TABLE  III 


PHONEME  REPRESENTATION  USED  FOR  VOCABULARY 
IN  WORD  RECOGNITION  ALGORITHM 


WORD 

PHONEME 

REPRESENTATION 

ZERO 

2 

-  3  -  6  - 

7 

ONE 

8  • 

-  9  -  10  - 

12  - 

13 

TWO 

14 

-  15  -  17 

-  7 

THREE 

18 

-  19  -  21 

-  22 

FOUR 

23 

-  24  -  25 

-  27 

FIVE 

28 

-  29  -  30 

-  31 

SIX 

32 

-33-1 

-  35  - 

36 

SEVEN 

37 

-39-40 

-  42 

-  13 

EIGHT 

43 

-  1  -  44 

NINE 

13 

-29-47 

-  13 

CCIP 

37 

-49-36 

-  49 

-  53 

ENTER 

54 

-13-56 

FREQUENCY 

28 

-  33  -  1  - 

-  13  - 

64 

STEP 

65 

-  1  -  66  - 

•  67 

THREAT 

68 

-19-69 

-  70 

Program  LEARN 


Program  LEARN,  the  word  recognition  program,  uses  a  set 
of  fuzzy  variables  for  recognition  scoring.  These  fuzzy 
variables  can  be  different  for  each  word.  The  fuzzy 
variables  used  in  this  research  are  listed  in  Figure  21. 
The  values  listed  in  Figure  21  are  variables  which  can  be 
changed  to  improve  the  performance  of  program  LEARN.  The 
allowable  limits  for  each  variable  and  their  meaning  can  be 
found  in  Montgomery's  thesis  (Ref  3).  These  values 
presented  in  Figure  21  would  have  to  be  used  to  duplicate 
the  results  in  this  report. 


THE  OVERALL  FUZZY  VARIABLES  THAT  WERE  USED  FOLLOW 


STHR  = 

1 . 0E+00 

SUBE  = 

1.0E+00 

SUBF  ^ 

5.0E-01 

INSE  = 

1.3E+00 

INSF  = 

5.0E-01 

DELE  = 

1.0E+00 

DELF  = 

8.0E-01 

DELG  = 

1 .0E-01 

DONE  = 

1.0E+00 

DCNF  = 

1.2E+00 

DCNG  = 

5.0E-01 

SFE  = 

2 . 0E+00 

SFF  = 

2 . 0E+00 

CHVE  « 

4.0E+00 

CHVF  = 

2.5E-01 

STATE3 

1.0E+00 

STATF= 

3 . 0E+00 

STATG= 

0.0E+00 

THR1E= 

1 . 0E+00 

THRlFa 

7.5E-01 

THR2E= 

1.0E+00 

THR2F= 

5.0E-01 

THE  W0RD  FUZZY  VARIABLES 

F0LL0W 

WSTHR  = 

8.0E-01 

WSUBE  = 

1.0E+00 

WSUBF  = 

5.0E-I 

WINSE  = 

1.3E+00 

WINSF  = 

5.0E-01 

WDELE  = 

1.0E+00 

WDELF  = 

8.0E-01 

WDELG  = 

1.0E-! 

WDCNE  = 

1 .0E+00 

WDCNF  = 

1.2E+00 

WDCNG  = 

5.0E-' 

WSFE  = 

2.0E+00 

WSFF  = 

2 . 0E+00 

WCHVE  = 

4. 0E+00 

WCHVF  = 

2. 5E-01 

WSTATE= 

1 .0E+00 

WSTATF= 

3 . 0E+00 

WSTATG= 

7 . 0E-I 

WTHRlEa 

1.0E+00 

WTHR1F= 

7.5E-01 

WTHR2E= 

1 .0E+00 

WTHR2F= 

5.0E-01 

Figure  21.  Fuzzy  Variables  Used  For  All  Words 
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V.  Results 


The  results  of  five-vector  phoneme  templates  using  Ml 
distance  are  listed  in  Table  IV  along  with  the  results  of 
one-vector  phoneme  for  both  Ml  distance  and  M2  distance. 
The  results  are  similar  for  five-vector  Ml  distance,  one- 
vector  Ml,  and  M2  distances  for  normal  speech  (remember  all 
speech  was  from  subjects  wearing  a  mask).  Five-vector  and 
one-vector  phoneme  templates  have  different  results  in 
feature  extraction  and  recognition  when  G-speech  is  used. 
From  Table  IV  it  can  be  seen  that  G-speech  has  higher 
recognition  scores  for  the  five-vector  phoneme  template  than 
the  one-vector  template. 

The  recognition  files  listed  in  Table  IV  are  labeled  C 
for  control  files,  3  for  3g  files,  and  5  for  5g  files.  In 
Tables  V  -  VIII  the  following  A,  B,  P,  C,  3,  and  5  represent 
different  speech  files.  Files  A,  B,  P,  and  C  are  speech 
files  at  no  G-stress  (control  conditions).  Files  3  and  5 
are  speech  files  at  3g  and  5g  respectively. 


Table  IV 

RECOGNITION  RESULTS 


PHONEME/DISTANCE  TRAINING 

EXPERIMENT/LENGTH/  RULE  FILES (45) 


RECOGNITION 
FILES (15  ea.) 


1 

/ 

5 

/ 

Ml 

100% 

93% 

100% 

2 

/ 

1 

/ 

Ml 

98% 

93% 

53% 

3 

/ 

1 

/ 

M2 

93% 

93% 

27% 

4 

/ 

1 

/ 

M2 

91% 

93% 

47% 

(experiment  4  used  different  fuzzy  variables 
phoneme  representations  for  some  words) 


Tables  V  thru  VIII  have  recognition  scores  listed  tor 
all  the  words  and  experiments.  Scores  were  similar  for 
five-  vector  and  one-vector  phoneme  templates  as  seen  by  the 
recognition  scores.  Recognition  results  suffer  when  the 
words  not  to  be  recognized  score  higher  than  the  actual  word 
to  be  recognized. 

Studying  the  input  files  to  the  recognition  system 
(program  LEARN),  gave  insight  into  why  the  scores  increased 
for  the  words  not  wanted.  For  G-speech  the  five-vector 
phoneme  templates  gave  a  more  consistent  output  than  the 
output  from  single-vector  phoneme  templates. 

Figure  22  (five- vector)  and  Figure  23  (one-vector)  are 
the  output  files  from  the  feature  extraction  system  for  the 
word  "eight"  at  five  G's.  The  first  five-vectors  in  both 
the  processes  that  correspond  to  the  same  time  vectors,  show 
the  five-vector  file  to  contain  five  different  phonemes  in 
the  top  five  choices,  whereas  the  single-vector  phoneme 
template  feature  extraction  listed  in  Figure  23  shows  twelve 
different  phonemes  in  the  first  f ive-vectors.  In  addition, 
the  five-vector  phonemes  are  more  consistent  in  the 
representation . 

The  recognition  algorithm  looks  at  the  single-vector 
phoneme  template  feature  extraction  file,  the  word  eight  as 
represented  by  Figure  23,  and  tries  to  score  each  of  the 
vocabulary  words  against  this  file  (Figure  23).  The  single¬ 
vector  files  are  more  inconsistent  and  therefore  the  other 
words  score  higher  than  five-vector  files  based  on  what  the 


TABLE  V 


RECOGNITION  SCORES  FOR  EXPERIMENT  1 

Phoneme  Length:  5  vector 
Distance  Rule:  Ml 


WORDS  FILES 


TO  BE 

RECOGNIZED 

TRAINING 

SET 

RECOGNITION 

SET 

A 

B 

P 

C 

3 

5 

ZERO 

.76 

.83 

.  80 

.  80 

.  73 

.67 

ONE 

.85 

.83 

.81 

.84 

.64 

.62* 

TWO 

.  78 

.86 

.88 

.87 

.  71 

.  74 

THREE 

.86 

.73 

.86 

.78 

.82 

.67 

FOUR 

.85 

.  74 

.87 

.66* 

.  72 

.45* 

FIVE 

.83 

.86 

.87 

.82 

.68 

.64 

SIX 

.84 

.88 

.  88 

.85 

.  76 

.  68 

SEVEN 

.84 

.  78 

.85 

.80 

.68 

.60* 

EIGHT 

.86 

.89 

.88 

.  86 

.84 

.83 

NINE 

.82 

.83 

.87 

.82 

.76 

.76 

CCIP 

.82 

.86 

.80 

.  76 

.  76 

.77 

ENTER 

.85 

.84 

.84 

.84 

.78 

.  75 

FREQUENCY 

.82 

.  77 

.65 

.82 

.65 

.  69 

STEP 

.82 

.88 

.88 

.85 

.  79 

.69 

THREAT 

.82 

.  75 

.83 

.  71 

.  73 

.  70 

Percent 

Correct 

100 

100 

100 

93.3 

100 

.80 

MEAN 

.829 

.  805 

.  788 

.  684 

STANDARD 

DEVIATION 
Word  missed 

.049 

.058 

.061 

.089 

45 


TABLE  VI 


RECOGNITION  SCORES  FOR  EXPERIMENT  2 


Phoneme  Length:  1  vector 
Distance  Rule:  Ml 


WORDS 

TO  BE 
RECOGNIZED 

FILES 

TRAINING  RECOGNITION 

SET  SET 

A 

B 

P 

C 

3 

5 

ZERO 

.81 

.86 

.74 

.82 

.64* 

.63* 

ONE 

.83 

.86 

.80 

.81 

.64 

.  60* 

TWO 

.81 

.84 

.86 

.83 

.65* 

.63* 

THREE 

.86 

.81 

.80 

.  76 

.  76 

.67* 

FOUR 

.79 

.83 

.84 

.68* 

.69* 

.54* 

FIVE 

.85 

.88 

.83 

.85 

.  56* 

.62* 

SIX 

.82 

.84 

.83 

.78 

.67* 

.68* 

SEVEN 

.84 

.  79 

-81 

.  77 

.71* 

.69* 

EIGHT 

.89 

.89 

.92 

.87 

.83 

.79 

NINE 

.83 

.83 

.  80 

.82 

.68* 

.65* 

CCIP 

.  77 

.81 

.82 

.80 

.75 

.73 

ENTER 

.  77 

.  78 

.81 

.  76 

.  75 

.70 

FREQUENCY 

.78 

.84 

.  77 

.79 

.  75 

.72 

STEP 

.84 

.89 

.  79* 

.81 

.  75 

.69* 

THREAT 

.71 

.81 

.83 

.  75 

.  73 

.  74 

Percent 

Correct 

100 

100 

93 . 3 

93 . 3 

53 . 3 

33 . 3 

MEAN 

.822 

.  793 

.  704 

.672 

STANDARD 

DEVIATION 

.041 

.046 

.067 

.063 

*Word  missed 
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TABLE  VII 


RECOGNITION  SCORES  FOR  EXPERIMENT  3 


Phoneme  Length:  1  vector 
Distance  Rule:  *M2 


WORDS 

TO  BE 

RECOGNIZED 

TRAINING 

SET 

FILES 

RECOGNITION 

SET 

A 

B 

p 

C 

3 

ZERO 

.81 

.85 

.  70* 

.82 

.65* 

.6' 

ONE 

.83 

.84 

.80 

.81 

.65 

.6i 

TWO 

.81 

.  76 

.85 

.  77 

.58* 

.6* 

THREE 

.86 

.  75* 

.80 

.74 

.74* 

.  6 

FOUR 

.82 

.82 

.84 

.68* 

.  71* 

.61 

FIVE 

.81 

.85 

.88 

.81 

.69* 

.6; 

SIX 

.85 

.85 

.86 

.82 

.  72* 

.6' 

SEVEN 

.83 

.79 

.80 

.78 

.65* 

.6' 

EIGHT 

.87 

.88 

.92 

.89 

.84 

.8; 

NINE 

.77 

.78 

.77* 

.78 

.65* 

.6 

CCIP 

.76 

.80 

.82 

.  79 

.78 

.  7i 

ENTER 

.78 

.77 

.81 

.76 

.76 

.  1. 

FREQUENCY 

.77 

.81 

.  79 

.80 

.73* 

.  7. 

STEP 

.83 

.84 

.89 

.82 

.68* 

.6! 

THREAT 

.80 

.80 

.  78 

.78 

.  75 

.  7 

Percent 

Correct 

100 

93.3 

86.7 

93.3 

26.7 

26 

MEAN 

.816 

.  79 

.  705 

.6' 

STANDARD 

DEVIATION 

.042 

.046 

.065 

.0! 

*Word  missed 


TABLE  VIII 


RECOGNITION  SCORES  FOR  EXPERIMENT  4 


Phoneme  Length:  1  vector  (8  ms) 
Distance  Rule:  M2 


WORDS 

TO  BE 

RECOGNIZED 

FILES 

TRAINING  RECOGNITION 

SET  SET 

A 

B 

P 

C 

3 

ZERO 

.89 

.88 

.84 

.86 

.75 

.7 

ONE 

.85 

.87 

.82 

.85 

.  71* 

.7: 

TWO 

.85 

.84 

.83 

.  81 

.66* 

.7 

THREE 

.86 

.75* 

.80 

.74* 

.  74* 

.6' 

FOUR 

.88 

.86 

.84 

.  77 

.71* 

.6: 

FIVE 

.81 

.89 

.88 

.85 

.69 

.6: 

SIX 

.87 

.87 

.74* 

.84 

.  70* 

.7/ 

SEVEN 

.85 

.86 

.81 

.82 

.70* 

.  61 

EIGHT 

.87 

.88 

.90 

.88 

.85 

.8] 

NINE 

.89 

.88 

.68* 

.89 

.63* 

.6^ 

CCIP 

.  76* 

.  84 

.89 

.82 

.82 

.  7  A 

ENTER 

.86 

.85 

.87 

.82 

.84 

.  7: 

FREQUENCY 

.83 

.34 

.82 

.82 

.  80 

.  Ti 

STEP 

.86 

.86 

.86 

.82 

.76 

.  72 

THREAT 

.83 

.80 

.84 

.77 

.75* 

.75 

Percent 

Correct 

93.3 

93.3 

86.7 

93.3 

46.7 

26. 

MEAN 

.843 

.824 

.74 

.  72 

STANDARD 

DEVIATION 

.044 

.041 

.065 

.05 

*V7ord  missed 


recognition  algorithm  expects  to  see  because  of  training 
(past  statistics).  This  conclusion  is  supported  by 
Montgomery's  thesis  when  he  discusses  accuracy  being  higher 
when  the  acoustic  analyzer  output  is  more  consistent  (Ref 
3:5).  In  Figures  24  thru  26  similar  results  can  be  seen  for 
normal  speech. 

Distance  Rule 

Two  distance  rules  were  analyzed  by  this  research,  the 
Ml  distance  and  the  M2  distance.  A  comparison  was  made 
between  the  Ml  distance  and  the  M2  distance  using  single¬ 
vector  phoneme  templates  feature  extraction  results.  These 
results  initially  point  to  the  Ml  distance  performing  better 
than  the  M2  distance.  However,  the  differences  between  the 
two  are  not  as  great  as  the  distance  seen  between  five- 
vector  and  one-vector  phoneme  templates.  It  is  hard  to 
distinguish  between  the  Ml  and  M2  distances. 

Figure  25  and  Figure  26  are  feature  extraction  files 
for  Ml  and  M2  distance  rules,  respectively.  The  two  files, 
in  these  figures,  have  only  minor  differences.  In  fact 
there  are  only  one  or  two  differences  between  t-he  vectors 
shown  in  the  top  choice.  The  second,  third,  fourth  and 
fifth  choices  have  more  differences;  still  no  significant 
difference  is  found  between  the  Ml  and  M2  distances  when 
analyzing  the  feature  extraction  system. 
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Figure  23.  "Eight"  (5g) 
1-vector  Phonemes 
Ml  distance 


Phoneme  Averagin' 


Phoneme  averaging  was  used  extensively  in  this  research 
project.  Phonemes  in  the  five-vector  phoneme  template  were 
averaged  when  ever  possible.  The  word  "eight  was 
represented  by  only  two  phonemes  averaged  from  all  the  "a" 
sounds  and  all  the  "t"  sounds  respectively.  The  averaging 
used  for  "eight"  was  successful  and  is  reflected  in  the  100% 
recognition  across  the  board  for  the  word  "eight"  by  all  the 
feature  extraction  processes  in  the  body  of  this  thesis.  In 
addition,  there  was  only  one  "n"  sound  used  in  this 
research.  In  previous  research  done  by  Seelandt  he  used  an 
"n"  sound  for  each  word  where  an  "n"  sound  occured 
throughout  the  vocabulary.  In  this  research  the  "n"  sound 
was  averaged  for  each  "n"  sound  in  the  vocabulary.  The  "n" 
sound  performed  well  and  was  identified  consistently 
throughout  the  feature  extraction  files.  The  one-vector 
phoneme  templates  were  all  averaged.  The  usual  number  of 
vectors  averaged  into  the  single- vector  phoneme  was  five  or 
more  vectors.  The  singl e- vector  phonemes  also  included 
twelve  average  noise  templates. 
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Figure  24.  "Eight"  no  G-stress 
5-vector  Phonemes 
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Figure  25.  "Eight"  no  G-stress 
1-vector  Phonemes 
Ml  distance 
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Figure  26*  "Eight"  no  G-stress 
1-vector  Phonemes 
M2  distance  rule 


VI.  Conclusions  and  Recommendations 


Conclusions 

There  are  three  main  conclusions  to  be  drawn  from  this 
research  which  involved  five- vector  and  one- vector  phoneme 
templates,  distance  rules  and  the  averaging  of  phoneme 
templates.  The  five-vector  phoneme  template  did 
significantly  better  in  the  recognition  results  for  G-speech 
and  was  more  consistent  in  the  feature  extraction  process 
than  the  one-vector  phoneme  template.  Single-vector 
phonemes  do  have  a  computational  advantage  over  the  five- 
vector  templates  but  this  advantage  does  not  overcome  the 
disadvantage  of  degraded  recognition,  discussed  above. 

The  Ml  and  M2  distance  rules  studied  showed  little 
differences  in  feature  extraction  output.  Even  though 
results  showed  Ml  distance  to  perform  slightly  better  on 
normal  speech  and  40%  better  on  cne  set  of  G-speech  files 
conditions,  adjusting  the  fuzzy  variables  and  changing  the 
phoneme  representations  (experiment  4)  led  to  better  results 
for  the  M2  distance.  In  addition  the  recognition  scores  for 
Ml  and  M2  distances  showed  little  differences.  Thus  it 
seems  that  the  Ml  distance  rule,  which  can  have  a  50% 
computational  advantage  in  number  of  actual  operations,  can 
be  used  with  results  equal  to  or  better  than  the  M2  rule. 

Phoneme  averaging  resulted  in  reducing  the  number  of 
phonemes  needed  per  word.  This  is  the  first  research 


project  based  on  Seelandt's  techniques  to  use  averaged 
phoneme  templates.  When  averaged  phonemes  were  used  for  the 


word  "eight11  only  half  the  number  of  phonemes,  compared  to 
what  Seelandt  used,  were  needed.  In  addition  the  feature 
extraction  based  on  the  average  phonemes  for  the  word 
"eight"  produced  output  more  consistent  than  the  multiple 
unaveraged  phonemes. 

Recommendations 

The  first  recommendation  to  be  made  would  cover  data 
acquisition.  This  thesis  used  G-speech  and  normal  speech  to 
analyze  the  feature  extraction  and  recognition  algorithm 
used  at  the  AFIT  Signal  Processing  Laboratory.  However,  the 
G-speech  obtained  was  not  in  sufficient  quantities  to 
establish  meaningful  baseline  results  for  G-speech.  There 
is  a  need  for  more  G-speech  or  using  G-speech  already 
processed.  In  addition,  actual  aircraft  speech  should  be 
obtained  if  possible  in  future  projects  since  the  noise 
level  is  signif icant ly  higher  compared  to  speech  obtained  in 
the  centrifuge. 

Another  study  may  want  to  investigate  the  use  of 
G-speech  templates.  Different  templates  could  be  used  that 
correspond  to  different  G-levels.  The  G-speech  templates 
could  be  implemented  in  a  real  aircraft  by  using  the  output 
of  the  G-meter  to  select  the  corresponding  G-template.  The 
only  drawback  is  that  different  sets  of  templates  would  have 
to  be  made  and  stored. 

It  is  also  recommended  that  extensive  use  of  the  array 
processor  be  made  for  algorithms  processing  speech  and  the 
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recognition  results  in  the  future.  Efficient  use  of  the 
array  processor  could  lead  to  shorter  turnarounds  for 
results.  In  this  study  the  recognition  of  45  files  could 
take  up  to  12  hours  to  run  on  the  Data  General  Eclipse 
(using  the  recognition  program  LEARN).  This  does  not 
include  the  run  time  for  feature  extraction  on  the  same  45 
files . 


Software  developed  in  this  research  and  in  the  research 
done  by  Martin  (Ref  2)  makes  the  energy  available  to  the 
recognition  routine.  However,  the  recognition  routine  did 
not  use  the  energy  in  this  research.  Future  researchers  may 
find  energy  to  be  useful  in  the  recognition  of  stops  found 
in  words  and  for  thresholding. 

This  research  concluded  that  five-vector  phoneme 
template  feature  extraction  system  outperformed  the  single¬ 
vector  feature  extraction  for  G-speech.  Thus  it  points  to 
the  need  to  study  variable  length  phoneme  templates  to  find 
the  optimal  length  for  feature  extraction.  Also,  many  of 
the  differences  found,  even  in  the  same  person's  speech, 
between  the  different  phoneme  sounds  found  in  speech 
utterances  can  be  attributed  to  minor  frequency  shifts  which 
result  in  a  degraded  feature  extraction  performance.  The 
need  for  a  dynamic  frequency  sliding  algorithm  which  would 
attempt  to  slide  the  phoneme  template  up  and  down  the 
frequency  components,  within  a  certain  tolerance,  to  find 
the  best  match  may  be  effective  in  improving  the  feature 
extraction  system. 
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Appendix  A 

Speech  Files  for  Thesis 


.  Speech  files  were  created  using  AUDIOHIST  on  the  NOVA, 
^^itized  files  are  stored  on  magnetic  tape  (MT0). 

Tape  #1  contains  the  files  listed  in  this  appendix. 


File  Name  Legend 


HCA0.SP  H 
C 
A 

0 

SP 


Speaker's  name 

control  or  static  conditions 
#  of  utterance  (A-E=l-5; 
P=prerun ,  S=postrun) 
word  spoken  ("zero") 
speech  file 


H30S.SP  H 
3 
0 

S 

SP 


speaker's  name 

g  level  in  z  direction 

g  level  in  y  direction  (A,C  =  +1.5g; 

B  =  -1 . 5g;  G  =  0g) 

word  spoken  (S-"step") 

speech  file 


Captain  Henwood  27  Apr  82  1420  hrs 


Digitized  speech  stored  on  MT0:2.  Conditions: 
Centrifuge  test  with  F-16  seat,  30  degree  bank  angle  with 
lateral  shoulder  pads,  pitch  axis  tracking  task. 


VILE 

MAX  v 

EDIT 

BLOCKS 

HCA0.SP 

2.89 

30 

HCB0.CP 

3.59 

30 

HCC0.SP 

4.01 

30 

HCD0. SP 

4.33 

30 

HCA1 . SP 

4.60 

30 

HCB1 .SP 

4.39 

30 

HCC  j.  .  SP 

4.54 

30 

HCD1 . SP 

4.07 

30 

HCE1 . SP 

4.49 

30 

HCA2 . SP 

4.47 

30 

HCB2.SP 

4.31 

30 

HCC2.SP 

4.60 

30 

HCD2 .SP 

4.17 

30 

HCE2 . SP 

4. 10 

30 

HCA3.SP 

4.77 

30 

HCB3.SP 

4.27 

30 

HCC3 .SP 

4.48 

30 

HCD3 . SP 

4.16 

30 

HCE3  SP 

3.85 

30 

HCA4 . SP 

4. 19 

30 

HCB4 . SP 

3 . 98 

30 

COMMENTS 

"ZERO" 

"  ONE  " 

.39v  noise  (breathing) 
•  81 v  noise  (breathing) 

.04v  noise  (typical) 

.26v  max  noise 
.Slv  breathing  noise 

noise 


HCC4.SP  3.86  30 

HCD4.SP  3.95  30 

HCE4.SP  3.88  30 

HCA5.SP  4.07  30 

HCB5.SP  3.57  30 

HCC5.SP  3.89  30 

HCD5.SP  3.76  30 

HCE5.SP  3.83  30 

HCA6.SP  3.79  30 

HCB6.SP  4.01  30 

HCC6.SP  3.99  30 

HCD6.SP  3.93  30 

HCE6.SP  4.02  30 

HCA7.SP  4.05  30 

HCB7.SP  3.61  30 

HCC7.SP  3.79  30 

HCD7.SP  4.06  30 

HCE7.SP  3.80  30 

HCA8.SP  3.76  30 

HCB8.SP  4.28  30 

HCC8.SP  3.91  30 

HCD8.SP  4.08  30 

HCE8.SP  3.86  30 

HCA9.SP  3.72  30 

HCB9.SP  3.50  30 

HCC9.SP  3.99  30 

HCD9.SP  3.93  30 

HCE9.SP  3.72  30 

HCAF.SP  3.95  30 

HCBF.SP  3.98  30 

HCCF.SP  3.87  30 

HCDF.SP  3.96  30 

HCEF.SP  4.09  30 

HCAE.SP  3.99  30 

HCBE.SP  3.87  30 

HCCE.SP  3.82  30 

HCDE.SP  3.97  30 

HCEE.SP  4.08  30 

HCAC.SP  3.75  40 

HCBC.SP  3.94  40 

HCCC.SP  3.55  40 

HCDC.SP  4.04  40 

HCEC.SP  3.95  40 

HCAT.SP  4.08  30 

HCBT.SP  3.72  30 

HCCT.SP  3.75  30 

HCDT.SP  4.16  30 

HCET.SP  4.11  30 

HCAS.SP  4.07  30 

HCBS.SP  4.07  30 

HCCS.SP  4.08  30 

HCDS.SP  4.08  30 

HCES.SP  3.59  30 


•25v  noise 


A-2 


Capt  Henwood,  Pre-run  Static  List  of  Words 


FILE 

MAX  v 

EDIT 

BLOCKS 

COMMENTS 

HCPC.SP 

3.85 

40 

"CCIP" 

HCPE.SP 

3.78 

40 

"ENTER" 

HCPF.SP 

3.86 

40 

"FREQUENCY 

HCPS.SP 

3.94 

40 

"STEP" 

HCPT.SP 

3.96 

40 

"THREAT " 

HCP0.SP 

3.84 

40 

HCPl . SP 

4.00 

40 

HCP2.SP 

4.22 

40 

HCP3.SP 

4.13 

40 

HCP4.SP 

4.24 

40 

HCP5.SP 

4.06 

40 

HCP6-SP 

4.19 

40 

HCP7.SP 

4.28 

40 

HCP8.SP 

3.93 

40 

HCP9.SP 

4.08 

40 

Captain  ; 

Henwood  G-Speec 

EDIT 

FILE 

MAX  v 

BLOCKS 

COMMENTS 

H300.SP 

4.29 

40 

3GZ,  0GY 

II301  .SP 

4.97 

40 

H302.SP 

4.55 

40 

H303 .SP 

4.16 

40 

H304.SP 

4.68 

40 

H305 .SP 

4.36 

40 

H306.SP 

4.19 

40 

H307.SP 

4.25 

40 

H308.SP 

4.60 

40 

H309.SP 

3.96 

40 

H30C.SP 

4.26 

40 

H30E-SP 

4.08 

40 

H30F-SP 

4.17 

40 

H30S.SP 

3.84 

40 

H30T-SP 

4.47 

40 

H500.SP 

4.39 

40 

5GZ ,  0GY 

H501 .SP 

4.71 

30 

H502.SP 

5(1) 

30 

H503.SP 

4.28 

25 

H504.SP 

4.81 

40 

H505.SP 

4.38 

40 

H506 -SP 

4.41 

40 

H507.SP 

4.38 

30 

H508.SP 

4.93 

40 

H 509 . SP 

4.01 

40 

H50C-SP 

4.32 

40 

H50E.SP 

3.86 

30 

H50F.SP 

4.81 

25 

A- 3 


EDIT 

FILE 


MAX  V 


BLOCKS 


COMMENTS 


H50S.SP  4.74  40 

H50T.SP  4.65  40 


Capt  C.  St  Sauver 


FILE  MAX  v 

SCA0.SP  3.17 
SCB0.SP  3.38 
SCC0.SP  3.36 
SCD0.SP  4.00 
SCE0.SP  3.76 
SCA1.SP  4.44 
SCB1.SP  4.22 
SCC1.SP  3.90 
SCD1.SP  4.43 
SCE1.SP  4.18 
SCA2.SP  4.26 
SCB2.SP  4.26 
SCC2.SP  3.67 
SCD2.SP  3.67 
SCE2.SP  3.99 
SCA3.SP  4.00 
SCB3.SP  4.54 
SCC3-SP  4.11 
SCD3.SP  4.07 
SCE3.SP  4.08 
SCA4.SP  4.07 
SCB4.SP  3.74 
SCC4.SP  3.83 
SCD4.SP  3.93 
SCE4.SP  4.00 
SCA5.SP  4.02 
SCB5.SP  4.53 
SCC5.SP  4.40 
SCD5.SP  4.71 
SCE5.SP  4.47 
SCA6.SP  4.88 
SCB6.SP  3.67 
SCC6.SP  4.77 
SCD6.SP  4.93 
SCE6.SP  5.00(2) 
SCA7.SP  5.00(4) 
SCB7.SP  4.31 
SCC7.SP  4.61 
SCD7.SP  4.45 
SCE7.SP  4.70 
SCA8.SP  3.98 
SCB8.SP  4.13 


EDIT 

BLOCK  COMMENTS 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30  . 14v 

30 

30  . 12 v 

30 

30 

30 

30 

30 

30  .47v  pre-noise 

30 

30 

30 

30 

30  . 18v  noise  (MAX) 

30 

30 

30 

30 

30 

30 


A-4 


EDIT 

FILE 


MAX  v 


BLOCKS 


COMMENTS 


SCC8.SP 

4.11 

30 

SCD8.SP 

4.03 

30 

SCE8.SP 

3.83 

30 

SCA9 . SP 

4.16 

30 

SCB9.SP 

4.35 

30 

SCC9.SP 

3.95 

30 

SCD9.SP 

4.08 

30 

SCE9.SP 

3.93 

30 

SCAC . SP 

4.23 

40 

SCBC.SP 

4.20 

40 

SCCC.SP 

4.11 

40 

SCDC.SP 

4.19 

40 

SCEC.SP 

4.21 

40 

SCAE.SP 

4.47 

30 

SCBE.SP 

4.49 

30 

SCCE.SP 

4.44 

30 

SCDE.SP 

4.31 

30 

SCEE. SP 

4.32 

30 

SCAF. SP 

3.89 

30 

SCBF .SP 

3.85 

30 

SCCF.SP 

3.78 

30 

SCDF.SP 

3.69 

30 

SCEF. SP 

3.65 

30 

SCAS.SP 

4.71 

30 

SCBS.SP 

4.58 

30 

sees . SP 

4.44 

30 

SCDS.SP 

4.48 

30 

SCES.SP 

4.54 

30 

SCAT. SP 

4.65 

30 

SCBT . SP 

4.43 

30 

SCCT.SP 

4.57 

30 

SCDT.SP 

4.50 

30 

SCET. SP 

3.76 

30 

S30E.SP 

4.46 

30 

S305.SP 

4.72 

30 

S30T.SP 

4.54 

30 

S309.SP 

4.62 

30 

S304.SP 

4.81 

30 

S301.SP 

4.41 

30 

S300 .SP 

4.50 

30 

S30F.SP 

4.56 

30 

S306 . SP 

4.70 

30 

S30S.SP 

4.64 

30 

S302.SP 

4.80 

30 

S307.SP 

4.74 

30 

S502 .SP 

4.99 

30 

S503 . SP 

4.56 

30 

S50S.SP 

4.64 

30 

S  504 . SP 

4.88 

30 

S508.SP 

4.62 

30 

S50C. SP 

4.50 

40 
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EDIT 

FILE 


MAX  v 


BLOCKS 


COMMENTS 


S501 . SP 

S50T.SP 

S505 . SP 

S509.SP 

S50E.SP 

S50F.SP 

S507.SP 

S500.SP 

S506.SP 

S3B3.SP 

S3B8. SP 

S3B6.SP 

S3B4.SP 

S3B0.  SP 

S3B5.SP 

S3B2.SP 

S3B7.SP 

S3BC.SP 

S3BF.SP 

S3BT.SP 

S3 BE. SP 

S3B9. SP 

S331.SP 

S3BS.SP 

S5A1 . SP 

S5A2.SP 

S5AS.SP 

S5AG.SP 

S5AT.SP 

S5A4.SP 

S5AE.SP 

S5C6.SP 

S5AC.SP 

S5CT.SP 

S5A0. SP 

S5CE.SP 

S5CS.SP 

S5A8.SP 

S5C2.SP 

S5B4.SP 

S5B1  .SP 

S5A7.SP 

S5A3.SP 

S5A5.SP 

S5AF.SP 

S5A9.SP 

S3AC.SP 

S3AF.SP 

S3A3.SP 

S3 AS . SP 

S3A1 . SP 


4.81 

4.68 

4.22 

3.99 

4.54 
4.41 
4.70 

4.55 
4.41 
4.29 
4.24 
4.43 
4.24 
4.43 
4.21 

4.31 
4.41 
3.90 
4.45 

4.20 

4.21 
4.17 

4.39 

4.51 
4.17 

4.32 
4.28 

4.24 
4.13 

4.25 

4.40 
4.59 
4.10 

4.22 
4.45 

4.52 
4.10 
3.97 
4.30 
4.24 
4.05 
4.54 
4.07 
3.88 
3.92 
4.01 
3.84 
3.74 
3.56 
4.16 
3.84 


30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

40 

40 

30 

30 

30 

30 

30 

40 

30 

30 

30 

30 

30 

30 

30 

40 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

40 

30 

30 

30 

30 


Rename  S5A-. -  S5C-.- 
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C* 


EDIT 

FILE  MAX  v  BLOCKS  COMMENTS 


S3AT.SP 

3.97 

30 

S3A2.SP 

4.40 

30 

S3A5 . SP 

4.04 

30 

S3A9.SP 

4.11 

30 

S3A4.SP 

4.07 

30 

S3A7 .SP 

4.49 

30 

S3A6.SP 

4.50 

30 

S3AE.SP 

4.33 

30 

S3A8.SP 

4.01 

30 

S3A0.SP 

4.57 

30 

S5B2.SP 

4.31 

30 

S5B9.SP 

4.32 

30 

S5BC.SP 

4.22 

40 

S5B7.SP 

4.23 

30 

S5BF. SP 

4.38 

30 

S5B4.SP 

4.20 

30 

S5BE.SP 

4.29 

30 

S5B8.SP 

3.84 

30 

S5B5.SP 

4.24 

30 

S5BT.SP 

4.23 

30 

S5B0.SP 

4.24 

30 

S5B3.SP 

3.92 

30 

S5BS.SP 

4.55 

30 

S5B1 .SP 

3.99 

30 

S5B6 . SP 

4.39 

30 

S3G1 .SP 

4.16 

30 

S3G6.SP 

4.43 

30 

S3G8. SP 

4.27 

30 

EDIT 

FILE 

MAX  v 

BLO 

noise  up  to  .97 


.72v  noise 


EDIT 

FILE 


MAX  V 


S3G7.SP  4.05 
S3GS.SP  4.02 
S3GF.SP  4.20 
S3GC.SP  3.91 
S3GT.SP  4.28 
S3G0.SP  4.14 
S3G3.SP  3.95 
S3G2.SP  3.80 
S3G9.SP  4.04 
S3G5.SP  4.09 
S3G4.SP  4.07 
S3GE.SP  4.41 
SCSC.SP  3.83 
SCS5.SP  4.06 


30 

30 

30 

40 

30 

30 

30 

30 

30 

30 

30 

30 

40 

30 


SCS5.SP  4.28 
SCS6.SP  4.62 
SCS4.SP  4.11 
SCS8.SP  3.27 
SCS3.SP  3.68. 
SCS0.SP  4.17 
SCS7.SP  4.52 
SCS1.SP  4.08 
SCST.SP  4.31 
SCS9.SP  3.82 

SCSE. SP  3.62 

SCSF. SP  3.25 
SCS2.SP  3.20 


BLOCKS 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 
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Martin's  program  does  a  DFT  of  the  digitized  speech  and 
the  size  of  the  DFT  can  be  specified  in  the  program. 
However,  program  SPENPLOT  will  only  print  up  to  a  256  point 
DFT  (128  frequency  components)  because  of  the  132  character 
limit  on  the  Printronix  model  P300  printer.  SPENPLOT  will 
send  the  necessary  symbols  to  the  Printronix  model  P300 
printer  to  create  a  spectrogram  as  seen  in  Figure  B-2  thru 
B-ll.  SPENPLOT  accepts  input  files  which  consist  of  a 
header  (block  0  with  256  integers)  followed  by  data  blocks 
which  contain  128  real  numbers  per  each  block  of  data.  A  64 
point  DFT  has  four  vectors  (8  milliseconds  per  vector)  in 
each  data  block.  SPENPLOT  checks  the  header  with  values 
listed  in  Table  B-I. 

The  values  listed  in  Table  B-I  help  prepare  the 
spectrograms.  The  header  of  the  spectrogram  is  filled  out 
by  reading  the  header  (block  0)  of  the  data  input  files.  In 
program  DRVR  the  dc  component  of  the  spectrum  was  replaced 
by  the  energy  per  vector  before  normalization.  SPENPLOT  can 
accept  information  that  has  been  normalized  or  not 
normalized  by  DRVR.  The  program  SPENPLOT  listed  in  this 
appendix  will  only  give  a  scale  for  64  point  and  128  point 
DFTs.  However,  the  spectrum  will  be  created  for  lower  DFT 
sizes  and  up  to  256  point  DFTs.  DFT  sizes  greater  than  256 
will  cause  erroneous  output  from  SPENPLOT. 

The  source  code  that  follows  will  allow  regular 
interactive  use  when  compiled  using  the  FORTRAN/X  statement. 
This  program  will  be  loaded  with  the  relocatable  binary  for 
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SPENPLOT  and  subroutines  BYTEOUT,  IOFT5  and  the  FORTRAN 
library.  Program  SPENPLOT  was  developed  from  spectrogram 
programs  found  in  Seelandt's  thesis  (Ref  A).  See  the 
Printronix  manual  for  how  to  use  the  plot  mode  as  used  in 
SPENPLOT  for  the  spectrogram  plot. 


Table  B-I 

Header  Values  used  for  Program  DRVR  and  SPENPLOT 


[ELEMENT 

1-13 
14-26 

27 

28 

29 

30 

31 

32 


33 

34-53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

*63 

64-256 


CONTENTS 


4) 


Observation  file  name( channel 
Speech  file  name( channel  5) 

Switch:  l=preemphasize  0=don ’ t  preemphasize 
Preemphasis  slope 
Preemphasis  corner  frequency 
Number  of  time  points  per  FFT 
k  Switch:  l=Hamming  window  0=rectangular  window 
Normalization:  1  =  normalize  to  unity 
2  =  no  normalization 
0  =  divide  by  vector  energy 
Switch:  l=create  test  file  0=don't  create 
#not  used 

Vector  length  of  phonemes 
Number  of  first  time  slice  in  file 
Number  of  last  time  slice  in  file 
Number  of  points  per  time  slice  in  file 
Switch:  1  overlapping  0=non-over lapping 
Number  of  disk  blocks  in  observation  file 
Switch:  l=deemphasis  0-no  deemphasis 
Deemphasis  slope 
^Deemphasis  corner  frequency 

^Switch:  l^phoneme  file  0=not  phoneme  file 
Used  to  store  times  phoneme  has  been  modified 
(Can  only  store  193  modification  numbers.) 


*  Added  for  use  by  program  MKPHON,  which  makes 
phoneme  temp1  ites. 

**  Entry  32  is  used  by  program  SPENPLOT,  so 
proper  spectrogram  is  plotted. 
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C  PROGRAM:  SPENPLOT 

C  LANGUAGE:  FORTRAN5 

C  DATE:  7  OCT  82 

C  AUTHOR:  K.  BEACHY 

C  SUBJECT:  SPEECH,  PLOTS  SPECTRUM  AND  ENERGY 

C 

C  COMPILE:  FORTRAN/X  SPENPLOT  (FOR  REGULAR  VERSION) 

C  FORTRAN  SPENPLOT  (FOR  AUTO  VERSION) 

C  (RENAME  THE  AUTO  PROGRAM) 

C  LOAD:  RLDR  SPENPLOT  BYTEOUT  IOFT5  3FLIB3 

C  RLDR  (AUTONAME)  BYTEOUT  IOFT5  3FLIB3 

C 
C 

C  - THIS  PROGRAM  CREATES  A  SPECTROGRAM  FROM  THE 

C  OBSERVATION  FILES  CREATED  BY  DRVR. 

C  These  observation  files  contain  frequency  components  from 

C  the  speech  files  that  have  been  DFT  by  DRVR. 

C  The  input  files  consist  of  a  HEADER  block  (256  integers), 

C  followed  by  data  blocks  which  contain  128  reals  per  block. 

C  For  a  64  point  DFT  there  would  be  4  vectors(8ms  in  each  vector) 

C  in  each  block  of  data  corresponding  to  the  original  speech  file. 
C 

C  INITIALIZE  VARIABLES 


C 

INTEGER  FILENM(13) ,SAID (50) 

INTEGER  HEADER (256), STRBLK,NUMVEC,NUMCMP,BLKRD,VECBLK,S FAC 
DIMENSION  ISYMB0L1 (1 0) ,ISYMB0L6(1 0) ,IDATE(3) 

DIMENSION  IBI(128),DSP(512),IENSYM(6),ISDSYM(6) 

DIMENSION  ISYMB0L2(1 0) ,ISYMB0L3(1 0) ,ISYMB0L4(1 0) ,ISYMB0L5(1  0) 
C0MM0N/BLK/ISYMB0L1 ,ISYMB0L2,ISYMB0L3, ISYMB0L4,ISYMB0L5 
COMMON/ BLK/ IS YMB0L6,IENSYM,ISDSYM 
REAL  TFAC 
C 

C  PLOTTER  CHARACTERS. 

C 

ICHAN  =  3 

IPL0T=005K  ;PLOT  COMMAND 

ILF=01 2K  ;PRINT  LINE  OF  DATA  JUST  SENT. 

IN0N=101K  ;line  components  of  time  axis 

IZER  =  100K  ;send  nothing  to  printer 

ICNE  =  107K  ;separates  single  vectors  on  time  axis 

ITENT  =  137K  ;sends  tens  markers  for  time  axis 

ITEN=177K  ; DASH  USED  FOR  SCALE  ON  SGRAM 

IBYTE  =  999 

IBLANK=0  ; EMPTY  CHARACTER. 

ICOUNT  =  0 

ESCALE  =  1500.0  ;energy  scale  increment 
SFAC  =  3500  ;changes  plot  intensity 
TFAC  =  10.0  ;scale  to  set  symbols 
C  TFAC  is  later  set  to  0.1  for  those  file  not  normalized 

C 

C  SPECTROGRAM  SYMBOLS. 

C 

DATA  ISYMB0L1/1 00K,1 00K,1 00K,1 22K,1 22K,1 22K,1 22K,1 66K,177K,177K/ 


DATA  ISYMB0L2/100K,122K,166K,166K,177K,177K,177K,177K,177K,177K/ 
DATA  ISYMBOL3/100K,100K,100K,100K,100K,122K,133K,133K,133K,177K/ 
DATA  ISYMBOL4/100K,100K,100K,122K,122K,122K,122K,166K,177K,177K/ 
DATA  I SYMBOLS /I OOK,1 22K,1 66K,1 66K,177K,177K,177K,177K,177K,177K/ 
DATA  ISYMBOL6/100K,100K,100K,100K,100K,122K,133K,133K,133K,177K/ 
DATA  IENSYM/101K,102K,104K,110K,120K,140K/ 

DATA  ISDSYM/101K,103K,107K,117K,137K,177K/ 

C 

C  INPUT  CONTROL  VARIABLES,  OPEN  FILES,  &  PRINT  HEADING  ON  SGRAM. 

C 

C  FOR  SPECTROGRAPH  GENERATION 

X  IF(ICOUNT.EQ.O)  GO  TO  3  ;dummy  to  compile  and  skip  2 

CALL  10  FT5 (2,M1 , F1LENM,SAID, II ,M2, 12, 13,14) 

CALL  OPEN(2,"FILE2",1,IERl ) 

IF(IERl .NE.1)  TYPE"ERROR  ON  OPEN,  IER1=",IER1 
X3  ACCEPT"ENTER  FILE  WHICH  CONTAINS  THE  SPECTRAL  COMPONENTS 
X  /  <15>  FILENAME  =  " 

X  READ (1 1,1)  FILENM(I) 

XI  F0RMATCS13) 

X  ACCEPT*'<15>  WORDCOR  SENTENCE)  SPOKEN  =  " 

X  READC1 1,2)  SAID  Cl ) 

X2  F0RMATCS50) 

X  CALL  0PENC2, FILENM,1 ,IER1 ) 

X  IFCIER1  .NE.1)TYPE"ERR0R  ON  OPEN,  IF.R1=",IER1 

C 

C  Read  header  and  set  up  program  to  make  proper  spectrogram 

C 

CALL  RDBLK(2,0,HEADER,1,IER2) 

IFCIER2.NE.1 )TYPE"ERROR  ON  RDBLK,  IER2=",IER2 
NUMVEC  =  (HEADER(56)-HEADER(55))+1 
IF(HEADER(58).EQ.0.0R.HEADER(63).EQ.1)  GO  TO  400 
NUMVEC  =  NUMVEC-1  ;then  skip  last  vector 
400  CONTINUE 

NUMCMP  =  HEADERC57) 

VECBLK  =  1 28/NUMCMP 
BLKRD  =  INTC (1 /VECBLK) -0.1 ) +1 
C  If  no  normalization  was  used  reset  scale 
IF (HEADER (32) .EQ.2)  TFAC=0.1 
C 

C  Symbols  file  give  only  symbols,  no  header  on  output 

C 

;CHECK  TO  SEE  IF  SYMBOLS  FILE  IS  DESIRED. 

X  ACCEPT  "SEND  SGRAM  TO  PRINTER?  (Y=1 ,N=2) : IREPLY 

X  IF  (IREPLY. NE.1)  GO  TO  580 

CALL  F0PEN(3,"$LPT") 

X  GO  TO  590 

X580  CALL  F0PENC3, "SYMBOLS")  ;XFER  SYMBOLS  $LPT,for  plot 

X590  CONTINUE 

X  TYPE"ENTER  SCALE  FACTOR  (3500  ok,  lower  values  darken  plot)" 

X  ACCEPT  "SCALE  FACTOR  TO  SET  SPECTRAL  INTENSITY  =  ",SFAC 

C 

C  Now  set  up  header  as  needed 

C 
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CALL  DATE(IDATE,IER) 

IF(IER.NE.I)  TYPE  "ERROR  ON  DATE,IER=",IER 
CALL  FGTIME(IHOUR,IMIN,ISEC) 

IF(HEADER(31 ) . EQ.1 )  WRITE(3,581) 

581  FORMATdX,"***  HAMMING  WINDOW  USED  ***") 

IFCHEADERC31 ) . EQ.O)  WRITE(3/585) 

585  FORMATd  X,"***  RECTANGULAR  WINDOW  USED  ***") 

WRITE(3,589)  FILENMd ) ,SAID(1 ) 

589  FORMATdX, "FILE:  ",S13, "SENTENCE  SPOKEN:  ",550) 

WRITE(3,592)  IDATE,IHOUR,IMIN,ISEC 
592  FORMATd  X,"D  ATE: ",  2 13, 15, 3X,  "TIME: ",3I3) 

WRITE(3,594)  HEADERC55) ,HEADER(56) 

594  FORMATd  X,"  FIR  ST  TIME  SLICE  =",12,"  LAST  TIME  SLICE  =  " 

IF(HEADER(58).EQ.1)  WRITE(3,583)  HEADER(30) 

583  F0RMAT(1X,"DFT  SIZE  =  ",I3,5X,"50%  OVERLAP") 

IF (HEADER (58) .EQ.O)  WRITE(3,596)  HEADER(30) 

596  F0RMAT(1X,"DFT  SIZE  =  ",I3,5X,"NO  OVERLAP") 

IF(HEADER(27) .EQ.1)  WRITE(3,597)  HEADER(28) ,HEADER(29) 

597  FORMATdX, "PREEMPHASIS  =  ",I2,"db/0CT  ABOVE  ",I3,"hz.") 
IF (HEADER (60) . EQ.1 )  WRITE(3,598)  HEADER (61 ), HEADER (62) 

598  FORMATdX, "DEEMPHASIS  =  ",I2,"db/0CT  BELOW  ",l3,"hz.") 
IF(NUMCMP.EQ.64)  GO  TO  602 

IF(NUMCMP.EQ.32)  GO  TO  610 
GO  TO  620 


DFT  size  =  128  points.  Send  scales  for  128pt  DFT 
602  WRITE(3,604) 

604  F0RMAT(30X, "FREQUENCY  (HZ) ",50X, "ENERGY") 

WRITE(3,606) 

606  FORMATd  X,"DC",13X,"1 .K",13X,"2.K",13X,"3.K",13X,"4.K" 

/  ,3X,"3K",7X,"93K",6X,"183K",6X,"273K",6X,"363K",6X,"453KI 
/  ,6X,"543K") 

WRITE(3,608) 

608  FORMATdX, "+ - + - + - + - + - + - " 

/  " - + - + - +",4X,"-t - + - + - + - + - " 

j  M  + _ -f- - -f - f - + - -f - (- _ +  »») 

GO  TO  620  ;finished  with  32  component  graph  label 
DFT  =  128  points,  send  proper  scales 
610  WRITE(3,61 2) 

612  FORMATdOX, "FREQUENCY  (HZ)", 36X, "ENERGY") 

WRITE(3,614) 

614  FORMATd X,"DC",5X,"1.K",5X,"2.K",5X,"3.K",5X, "4. K",2X,"1.3K 

/  ,5X,"40.3K",5X,"79.3K",4X,"118.3K",4X,"157.3K",4X,"196.3K") 
WRITE(3,61 6) 

616  FORMATd  X,"+ - + - + - + - +",4X,"+ - +" 

j  •* - 4- - -f - -j. - -f - \ - _ + _ ^ - +  "  ) 

620  CONTINUE 


Reset  ESCALE  for  6A  point  DFT.  When  ESCALE  is  changed 
you  must  change  the  scale  on  the  ENERGY  scale  formats. 


o  o  o  o  o  o  o 


IF (HEADER (30) .  EQ.64)  ESCALE  =  650 


C 

C 

C 


245 

C 

C  SAVE 
C 


248 

249 


250 

251 


238 

239 


MAIN  SECTION  OF  PROGRAM 


CALL  BYTEOUT (ICHAN,IBYTE) 

DO  1000  IM=1,NUMVEC 

STRBLK  =  INK  (IM-1 )  /VECBLK)  +1 

CALL  RDBLK(2,STRBLK,DSP,BLKRD,IER3) 

IF(IER3.NE.1  )TY  PE  "ERROR  ON  RDBLK,  IER3=",IER3 

The  dc  component  has  been  replaced  by  the  energy  present 
the  vector  before  the  vector  was  normalized. 

IBI(1)  =  1  ;ignore  dc  component 

IOFF  =  MOD ( (IM-1 ) ,VECBLK)*NUMCMP 


Set  up  energy  for  plot 

ENERGY  =  DSP(1+I0FF) 

ENMAX  =  359.0*ESCALE 
I F (ENERGY. GT.ENMAX)  ENERGY=ENMAX 

Set  up  frequency  components  for  plot 

DO  245  I=2,NUMCMP 

IBIlI)=INT(DSP(I+I0FF)/SFAC*TFAC)+1 
IF  (IBI(I).LE.O)  IBI (I)=1 
IF(IBKI)  .GT.10)  IBI  (I)=10 
CONTINUE 

SYMBOLS  THAT  WILL  CONSTRUCT  THE  SPECTROGRAM 

IF(ICOUNT.EQ.IO.OR.ICOUNT.EQ.O)  GO  TO  248 
CALL  BYTEOUT(ICHAN,IPLOT) 

CALL  BYTEOUTdCHAN, IONE) 

GO  TO  249 

CALL  BYTEOUTdCHAN, IPLOT) 

CALL  BYTEOUTdCHAN, ITEN) 

DO  250  J J=1 ,NUMCMP 
JS=IBI(JJ) 

CALL  BYTEOUTdCHAN, ISYMB0L1  (JS)) 

CONTINUE 
DO  251  1=1,4 

CALL  BYTEOUTdCHAN, IZER) 
IF(ICOUNT.EQ.IO.OR.ICOUNT.EQ.O)  GO  TO  238 
CALL  BYTEOUTdCHAN, IONE) 

GO  TO  239 

CALL  BYTEOUTdCHAN, ITENT) 

CONTINUE 
DO  254  IX=1 ,60 
DO  252  IY=1 ,6 


IF (ENERGY. 6T. FLOAT C(6*IX+IY-7)*ESCALE))  GO  TO  252 
CALL  BYTEOUT(ICHAN,ISDSYM(IY) ) 

GO  TO  256 
252  CONTINUE 

CALL  BYTEOUT(ICHAN,ITEN) 

254  CONTINUE 

256  CALL  BYTEOUT (ICHAN,ILF) 

CALL  BYTEOUT(ICHAN,IPLOT) 

CALL  BYTEOUTCICHAN, INON) 

DO  260  JJ=1,NUMCMP 
JS=IBI(JJ) 

CALL  BYTEOUTCICHAN, I SYMB0L2CJS) ) 

260  CONTINUE 

DO  261  1=1,4 

261  CALL  BYTEOUT (ICHAN,IZER) 

CALL  BYTEOUT (IC HAN, INON) 

DO  264  IX=1 ,60 

DO  262  IY=1 ,6 

IF (ENERGY.GT. FLOAT ( (6*IX+IY~7) *ESCALE) )  GO  TO  262 
CALL  BYTEOUT (ICH AN, IENSYM(IY)) 

GO  TO  266 

262  CONTINUE 

CALL  BYTEOUTCICHAN, IZER) 

264  CONTINUE 

266  CALL  BYTEOUTCICHAN, ILF) 

CALL  BYTEOUTCICHAN, IPLOT) 

CALL  BYTEOUTCICHAN, INON) 

DO  270  J J=1 ,NUMCMP 
JS=IBI(J J) 

CALL  BYTEOUTCICHAN, ISYMB0L3C JS) ) 

270  CONTINUE 

DO  271  1=1,4 

271  CALL  BYTEOUTCICHAN, IZER) 

CALL  BYTEOUTCICHAN, INON) 

DO  274  IX=1,60 

DO  272  IY=1 ,6 

IF (ENERGY.GT. FLOAT ( (6*IX+IY-7)*ESCALE) )  GO  TO  272 
CALL  BYTEOUTCICHAN, IENSYMCIY)) 

GO  TO  276 

272  CONTINUE 

CALL  BYTEOUTCICHAN, IZER) 

274  CONTINUE 

276  CALL  BYTEOUTCICHAN, ILF) 

CALL  BYTEOUTCICHAN, IPLOT) 

CALL  BYTEOUTCICHAN, INON) 

DO  280  J J=1 ,NUMCMP 
JS=IBI(JJ) 

CALL  BYTEOUTCICHAN, ISYNB0L4CJS) ) 

280  CONTINUE 

DO  281  1=1,4 

281  CALL  BYTEOUTCICHAN, IZER) 

CALL  BYTEOUTCICHAN, INON) 

DO  284  IX=1 ,60 

DO  282  IY=1 ,6 
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IF (ENERGY. GT. FLOAT ((6*IX+IY~7)*ESCALE))  GO  TO  282 
CALL  BYTEOUT ( I  CHAN, IENSYM ( I Y) ) 

GO  TO  286 
282  CONTINUE 

CALL  BYTEOUT (ICH AN, IZER) 

284  CONTINUE 

286  CALL  BYTEOUT ( I  CHAN, ILF) 

CALL  BYTEOUT ( 1CHAN, IPLOT) 

CALL  BYTEOUT (I CHAN, INON) 

DO  290  J J=1 ,NUMCMP 
JS=IBI(JJ) 

CALL  BYTEOUT ( ICHAN, IS YMB0L5 (J  S) ) 

290  CONTINUE 

DO  291  1=1,4 

291  CALL  BYTEOUT (ICHAN, IZER) 

CALL  BYTEOUT ( ICHAN, INON) 

DO  294  IX=1 ,60 

DO  292  IY=1 ,6 

IF(ENERGY.GT.FL0AT((6*IX+IY-7)*ESCALE))  GO  TO  292 
CALL  BYTEOUT( ICHAN, IENSYM (IY)) 

GO  TO  296 

292  CONTINUE 

CALL  BYTEOUT (ICHAN, IZER) 

294  CONTINUE 

296  CALL  BYTEOUT (ICHAN, ILF) 

CALL  BYTEOUT(ICHAN, IPLOT) 

CALL  BYTEOUT (I CHAN, INON) 

DO  300  J J=1 ,NUMCMP 
JS=IBI(JJ) 

CALL  BYTEOUT(ICHAN,ISYMBOl 6( J S) ) 

300  CONTINUE 

DO  301  1=1,4 

301  CALL  BYTEOUT (I CHAN, IZER) 

CALL  BYTEOUT (ICHAN, INON) 

DO  304  IX=1 ,60 

DO  302  IY=1,6 

IF (ENERGY. GT. FLOAT ( (6*IX+1Y~7)*ESCALE) )  GO  TO  302 
CALL  BYTEOUT ( ICHAN, ISDSYM (IY) ) 

GO  TO  306 

302  CONTINUE 

CALL  BYTEOUT (IC HAN, ITEN) 

304  CONTINUE 

306  I F ( I COUNT . NE . 1 0)  GO  TO  310 
ICOUNT  =  0 

310  CALL  BYTEOUT (ICHAN, ILF) 

CALL  BYTEOUT (I CHAN, IBLANK) 

C 

C  Keep  track  of  10  vectors  to  be  marked  off. 

C 

ICOUNT  =  ICOUNT+1 
C 

C  END  OF  SGRAM  CONSTRUCTION 
C 

1000  CONTINUE 
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APPENDIX  C 


SUBROUTINE  IOFT5  and  BYTEOUT 


Subroutine  IOFT5 

Subroutine  IOFT5  was  written  by  Lt  Simmons.  The 
version  presented  here  has  been  changed  slightly.  The  size 
of  FI  and  F2  arrays  were  increased.  Subroutine  I0FT5  was 
used  so  automatic  programs  could  be  run  using  macrofiles. 
Subroutine  I0FT5  was  used  to  pass  information  to  the 
automatic  programs.  In  one  case,  use  of  I0FT5  saved  hours 
of  editing  by  passing  needed  information  to  the  main 
program  from  a  macrofile.  Subroutine  I0FT5  can  be  used  for 
any  program  to  send  the  needed  switch  information  or  ASCII 
strings  to  the  main  program.  Subroutine  I0FT5  was  used  in 
program  SPENPLOT  and  program  TOPS  to  aid  automatic  program 
execution  using  a  macrofile. 


SUBROUTINE  IOFT5(N,MAIN,Fl ,F2,F3,MS,S1,S2,S3) 


Written  by  Lt.  Simmons  10  Sep  1981 

Version  2 

This  FORTRAN  5  subroutine  will  read  from  the  file 
COM. CM  IFCOM.CM  in  the  foreground)  the  program  name, 
any  global  switches,  and  up  to  three  local  file 
names  and  corresponding  local  switches. 

Calling  arguments: 

N  is  the  number  of  local  files  and  switches  to  be 
read  from  (F)COM.CM.  N  must  be  1,  2,  or  3. 

MAIN  is  an  ASCII  array  for  the  main  program  file  name. 

FI,  F2,  and  F3  are  the  three  ASCII  arrays  to  return 
the  local  file  names. 

MS  is  a  two-word  integer  array  that  holds  any  global 
switches. 

Si,  S2,  and  S3  are  two-word  integer  arrays  that 
hold  the  local  switches  corresponding  to  FI  through 
F3  respectively. 


Dimension  the  arrays. 

DIMENSION  MAIN ( 13) ,MS (2) 

INTEGER  F1(13),F2(50),F3(7),S1 (2) ,S2(2) ,$3(2) 

Check  the  bounds  on  N. 

IF(N.LT.1 .OR.N.GT.3)STOP  "N  out  of  bounds  in  I0F.M 
Process  the  data  in  (F)COM.CM 

CALL  GROUND(I)  ;Find  out  which  ground  program  is  in 
IF( I . EQ.0)0PEN  0, "COM. CM"  ;0pen  ch.  0  to  COM. CM 

IF(I.EQ.1 )0PEN  0, "FCOM.CM"  ;0pen  ch.  0  to  FCOM.CM 

CALL  COMARG (0,MAIN,MS,IER)  ;Read  from  (F)COM.CM 

IF(IER.NE.1)TYPE"  COMARG  error:", IER 
WRITE (1 0,1 )MAIN ( 1 )  ;Type  program  name 

FORMAT  C •  Program  1 , Si  3, 1 runni ng. 1 ) 

CALL  C0MARG(0,F1,S1,JER)  ;Read  from  (F)COM.CM 

IFCJER.NE.1 )TYPE"  COMARG  error  (F1):",JER 

IFCN.EQ.1 )G0  TO  2  ;Test  N 

CALL  COMARG (0, F2, S2,KER)  ;Read  from  (F)COM.CM 

I F  CKER. NE . 1 ) TYPE"  COMARG  error  (F2):",KER 

IF(N.EQ.2)G0  TO  2  ;Test  N 

CALL  COMARG (0, F3,S3,LER)  ;Read  from  (F)COM.CM 

IF (LER . NE . 1 ) TYPE"  COMARG  error  (F3):",LER 


Subroutine  n'*TEOUT 

Subroutine  BYTEOUT  is  similar  to  a  subroutine  program 
called  BYTEPAC  pack  by  Lt  Carl  Seelandt.  BYTEOUT  is  used 
with  program  SPENPLOT  and  packs  two  bytes  of  information 
into  one  memory  word.  This  information  is  then  transferred 
to  the  output  device,  the  printer  in  this  case.  It  was 
necessary  to  use  this  version  of  BYTEOUT  instead  of  byte 
pack.  This  version  of  BYTEOUT  does  not  send  extraneous  dots 
or  push  up  any  dots  in  a  line  when  you  are  plotting  with  the 
Printronix  model  P300  printer. 


SUBROUTINE  BYTEOUT(ICHAN/ IBYTE) 

IFUBYTE.EQ.999)  60  TO  100 

MASK  =  177400K 

IF(IFLAG.EQ.I)  GO  TO  50 

IOUT  =  IBYTE 

IOUT  =  ISHFT(I0UT/8) 

IOUT  =  IAND(IOUT/MASK) 

I FLAG  =  1 
RETURN 

IOUT  =  10 R( IOUT/ IBYTE) 

WRITE  BINARY(ICHAN)  IOUT 

I FLAG  =  0 

RETURN 

END 
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PROGRAM  MKPHON 

This  program  allows  the  user  to  develop  phoneme 
templates.  These  templates  can  be  used  by  program  DRVR  to 
find  the  distance  between  the  template  and  an  input  speech 
file  or  any  other  template  or  itself.  Program  MKPHON  runs 
interactively  and  uses  input  speech  files  that  consist  of 
frequency  components  (speech  files  after  DFT  and  only  in  the 
form  of  files  from  program  DRVR)  to  develop  a  phoneme 
template.  For  example,  if  you  have  an  input  file  and  you 
want  to  have  phoneme  #22,  you  select  the  vector  from  the 
speech  file  you  want  to  be  phoneme  22  and  how  many 
consecutive  vectors  from  that  input  vector  are  to  be 
included  in  the  new  or  modified  phoneme  22  (if  you  specify 
3  consecutive  vectors  to  be  averaged  in,  and  the  phonemes 
are  5-vectors  in  length,  then  the  next  3  consecutive  five- 
vector  groupings  will  be  averaged  into  the  specified  five- 
vector  phoneme).  When  program  MKPHON  is  started  it  requests 
the  input  name  of  the  phoneme  file.  If  this  phoneme  file  is 
a  new  file,  initialization  procedures  will  begin.  The 
program  will  ask  what  values  to  set  for  the  new  phoneme 
template's  characteristics  (DFT  size,  etc).  Program  MKPHON 
is  now  limited  to  constant  length  phoneme  template.  The 
information  requested  during  initialization  is  used  to  fill 
in  the  template  header  (block  0  in  the  file,  see  Table  D-l). 
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Table  D-I 


Header  Values  used  for  Program  DRVR  and  MKPHON 


ELEMENT  CONTENTS 


1-13 

14-26 

27 

28 

29 

30 

31 

32 


33 

34-53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

*63 

64-256 


Observation  file  name (channel  4) 

Speech  file  name( channel  5) 

Switch:  l=preemphasize  0=don't  preemphasize 
Preemphasis  slope 
Preemphasis  corner  frequency 
Number  of  time  points  per  FFT 

Switch:  l=Hamming  window  0=rectangular  window 
Normalization:  1  =  normalize  to  unity 
2  =  no  normalization 
0  =  divide  by  vector  energy 
Switch:  l=create  test  file  0=don't  create 
^not  used 

Vector  length  of  phonemes 
Number  of  first  time  slice  in  file 
Number  of  last  time  slice  in  file 
Number  of  points  per  time  slice  in  file 
Switch:  l=overlapping  0=non-overlapping 
Number  of  disk  blocks  in  observation  file 
Switch:  l=deemphasis  0=no  deemphasis 
Deemphasis  slope 
^Deemphasis  corner  frequency 
+ Switch:  l=phoneme  file  0=not  phoneme  file 
Used  to  store  times  phoneme  has  been  modified 
(Can  only  store  193  modification  numbers.) 


*  Added  for  use  by  program  MKPHON,  which  makes 
phoneme  templates. 


**  Entry  32  is  used  by  program  SPENPLOT,  so 
proper  spectrogram  is  plotted. 


Table  D-I  list  the  values  found  in  a  phoneme  template 
file  header  (block  0).  It  is  important  to  initialize  the 
template  properly,  because  only  matching  input  files  can  be 
used  to  modify  a  phoneme  template. 


MKPHON  averages  in  each  vector  requested  into  the 
phoneme  template.  Phoneme  averaging  for  each  vector  is  done 


by  the  following  equation: 

p*  =  (Pj  *  n)  +  Vj 
3  n  +  1 

P*-  new  phoneme  value 
P  -  old  phoneme  value 
V  -  input  value  to  be  averaged 
n  -  times  phoneme  has  been  modified 
j  -  phoneme  vector  component  number 

Each  new  modification  to  the  phoneme  template  is  equally 

weighted  with  previous  vectors.  When  each  phoneme  is 

averaged  (modified)  it  is  also  renormalized.  The  dc 

component  has  been  replaced  by  energy  and  is  not  normalized. 

Care  must  be  taken  not  to  enter  a  phoneme  from  a  speech  file 

which  has  been  made  from  an  overlapping  128  point  DFT,  which 

may  contain  erroneous  information  because  not  enough 

extended  memory  was  used.  This  type  of  error  can  be  seen  in 

Appendix  M  where  an  overlapping  128  point  DFT  phoneme 

template  was  used  for  recognition  results. 

The  normal  operation  of  program  MKPHON  is  to  enter  a 

speech  file  which  consist  of  frequency  components  from 

program  DRVR  (or  files  in  the  same  format).  The  format  of 

the  input  files  as  well  as  the  phoneme  template  format  1st 

Block  0  -  a  header  of  256  integers  which 
contains  important  file  information  (see  Table  D-l). 

Remaining  Blocks  -  multiple  data  entries 
consisting  of  an  energy  value  and  the  appropriate 
number  of  frequency  components  for  each  vector. 

Multiple  files  will  most  likely  be  entered  to  make  one 
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phoneme  template.  The  phoneme  templates  can  be  used  by 
programs  developed  by  Martin  to  find  the  distance  between 
the  template  and  other  speech  files.  When  the  template  has 
been  made  it  can  be  changed  later  in  part  or  whole  as 
necessary.  Every  time  the  program  is  used  it  should  be 
terminated  from  the  main  menu  in  order  to  properly  fill  out 
the  header  (block  0)  for  any  changes  made.  If  variable 
length  phoneme  templates  are  required  major  modifications 
will  have  tc  be  made  in  MKPHON. 


PROGRAM: 

LANGUAGE: 

DATE: 

AUTHOR: 

SUBJECT: 

LAST  REVISION: 


MKPHON 
FORTRAN  5 
10  SEP  82 
K.  BEACHY 

SPEECH,  PHONEME  GENERATION 
26  NOV  82 


COMPILE: 

LOAD: 


FORTRAN  MKPHON 
RLDR  MKPHON  3FLIBS 


This  program  allows  the  user  to  store  speech  in 
multiples  of  8ms  time  slices.  These  slices  can 
then  be  used  as  templates  for  the  program  "DRVR". 

VLENGTH - Length  in  vectors  of  phoneme 


INUM - Real  components  per  phoneme  vector 

VECBLK - Number  of  phoneme  vectors  per  128  component  block 


ISIZE- 


-Total  size  of  one  phoneme 


BLKSRD - Block  needed  to  read  or  write 

STRBLK - Block  to  start  read  or  write  for  phoneme 


PHST - Array  position  to  start  at 

START - Block  to  start  read  on  input  file 

VECST - Array  position  to  start  at 

IPT - Storage  pointer  for  modification  storage 


STATUSC9) — Is  the  number  of  the  last  block  in  file 
for  CALL  STAT(f i le,. .) 


Start  program 

INTEGER  FILEN(13), VLENGTH, VECBLK, ISIZE, BLKSRD, OPTION 
INTEGER  FLAG,PHNUM,IPH,STRBLK,LASTPH,STP,START,VECST 
INTEGER  IPT,STATUS(18),PHST,HEADER(256),HEADER2(256) 

INTEGER  DIFF,TPLATE(13),VECLFT,STRB,MAXPH0N 
DIMENSION  AAR(512),PHAR(512) 

Enter  name  of  new  or  exisiting  phoneme  template 

This  template  can  be  used  with  program  DRVR  to  find  the 

distance  between  the  phonemes  and  each  vector  in  a  speech  file 

ACCEPT"ENTER  FILENAME  FOR  PHONEME  TEMPLATE. <15> 

TEMPLATE  FILE  =  " 

READ (11,1)  TPLATE(I) 


yws 


CALL  F0PEN(4,TPLATE) 

CALL  RDBLK(4,0,HEADER,1,IER5) 

IF(IER5.EQ.9)  GO  TO  116 

IF(IER5.NE.1 )TYPE"ERROR  ON  RDBLK,  IER5=",IER5 
C 

C  Check  template  header  to  see  if  template  is  new 

C  if  new  initialize-if  not  skip  initialization 

C 

IF (HEADER (63) .EQ.1 )  GO  TO  111 
C 

C  INITIALIZE  THE  PHONEME  TEMPLATE  HEADER(BLOCK  0) 

C 

116  DO  115  J1=1,256  ;Clear  header 
HEADER(JI)  =  0 
115  CONTINUE 

112  TYPE”INITIALI7E  NEW  PHONEME  PROTOTYPES<15>" 

ACCEPT "Max  number  of  phonemes  for  TEMPLATE  =  ",MAXPH0N 
ACCEPT"Enter  0  for  NO  preemphisi s<15>  1  for 

/  preemphisis<15>  Preemphisis  =  ",HEADER(27) 

ACCEPT"Preemphisis  slope(db)  =  ",HEADER(28) 
ACCEPT"Preemphisis  corner  f requency(hz)  =  ",HEADER(29) 
ACCEPT"Enter  0  for  NO  preemphisis,<15>  1  for 

/  Deemphisis.<15>  Deemphisis  =  ",HEADER(60) 

ACCEPT"Deemphisis  slope(db)  =  ",HEADER(61) 
ACCEPT"Deemphisis  corner  f requency(hz)  =  ",HEADER(62) 
ACCEPT"Enter  0  for  rectangular  window, <15>  1  for 

/  haming  window. <15>  WINDOW  =  ",HEADER(31) 

ACCEPT"Enter  0  for  non-overlapping,<15>  1  for 

/  overlapping. <15>  OVERLAP  =  ”,HEADER(58) 

ACCEPT"Length  of  phonemes(in  vectors)  =  ",HEADER(54) 
TYPE"  Enter  number  of  components  per  vector. <15> 

/  Note:  A  128  point  FFT  =  64  components  per  vector." 
ACCEPT"Number  of  components  per  vector  =  ",HEADER(57) 
HEADER(30)  =  HEADER(57)*2 

HEADER(55)  =  1  ;First  time  slice  or  phoneme 
HEADER(63)  =  1  ;value  set  to  show  TPLATE  has  been  init 
C  £0F  TIME  SLICES=MAXPH0N*VECT0R  PER  PHONEME 
HEADER(56)  =  MAXPH0N*HEADER(54) 

C 

C  Now  print  out  values  and  check  initialization 

C 

TYPE"<15><15>  TEMPLATE  INIALIZATION  VALUES" 

TYPE"<15>Number  of  phonemes  in  template  =",MAXPH0N 
TYPE"Vectors(time  slices)  per  phoneme  =",HEADER(54) 
TYPE"Total  vectors  in  template  =",HEADER(56) 
TYPE’^reemphisislO^o,  1=yes)  =",HEADER(27) 
TYPE"Slope(db)  =",HEADER(28) 

TYPE"Corner  f requency(hz)  =",HEADER(29) 
TYPE"Deemphisis(0=no,  1=yes)  =",HEADER(60) 

TYPE"Slope(db)  =",HEADER(61) 

TYPE"Corner  f requency(hz)  =",HEADER(62) 

TYPE"Points  per  time  slice  ="/HEADER(57) 

TYPE"Points  per  FFT  =",HEADER(30) 
TYPE"Window(0=rectangular,  1=haming)  =",HEADER(31) 
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n  n  n  n  r>  o  o  o  o  r>  o  ooo  o  o  o 


TYPE"0verlap(0=no  overlap,  1=overlap)  =",HEADER(58) 

114  TYPE"<15><1 5>MAIN  >0PTI0NS:" 

TYPE"  1  =  C0NTINUE<15>  2  =  RE-INITIALIZE" 
ACCEPT"<15>  OPTION  =  ",I0PT 
IF(I0PT.EQ.2)  60  TO  112 

Set  values  for  program  operations 


111  VLEN6TH  =  HEADER(54) 

MAXPHON  =  HEADER(56) /VLENGTH 
INUM  =  HEADER(57) 

VECBLK  =  128/ INUM  ;128  reals  per  block 

ISIZE  =  INUM*VLENGTH 

BLKSRD  =  INT(CVLENGTH-I)/ VECBLK) +1 

GO  TO  35 

Be  sure  to  close  chan  2  at  proper  times!!! 

30  CALL  CL0SE(2,IER6) 

IF(IER6.NE.1)TYPE"ERR0R  ON  CLOSE,  IER6=",IER6 
GO  TO  35 

34  TYPE "TRY  AGAIN,  ERROR  ON  OPEN  IER3  =",IER3 

Enter  source  file  for  vectors  to  add  to  phoneme  template 
This  file  will  be  checked  for  compatibility  with  template 

35  TYPE"ENTER  FILENAME  WHICH  CONTAINS  FREQUENCY" 

TYPE"  COMPONENTS. (FROM  MARTIN’S  PROGRAM)" 

ACCEPT"<15>  FILENAME  =  " 

READ(11,1)  FILEN(I) 

1  F0RMAT(S13) 

CALL  0PEN(2,FILEN,1,IER3) 

IFCIER3.NE.1 )  GO  TO  34 


50  WRITEC1 0,52)  FILEN(I) 

52  F0RMAT("<15>  PRESENT  INPUT  FILE:  ",S13) 


CHECK  SOURCE  FILE  FOR  COMPATIBLE  VALUES 

CALL  RDBLK(2,0,HEADER2,1 ,IER12) 

IF(IER12.NE.1 )  TYPE  "ERROR  ON  f  ">BLK,  IER12  ,IER12 
DIFF  =  0 

IF(HEADER(27) ,NE.HEADER2(27) )  DIFF  =  DIFF+1 
IF(HEADER(28) . NE.HEADER2C28) )  DIFF  =  DIFF+1 
IF(H£ADER(29).NE.HEADER2(29))  DIFF  =  DIFF+1 
IF (HEADER (30) .NE.HEADER2C30) )  DIFF  =  DIFF+1 
IF(HEADER(31).NE.HEADER2(31))  DIFF  =  DIFF+1 
IF(HEADER(57).NE.HEADER2(57))  DIFF  =  DIFF+1 
IF (HEADER (5 8) .NE.HEADER2(58) )  DIFF  =  DIFF+1 
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IF (HEADER (60) .NE.HEADER2(60) )  DIFF  =  DIFF+1 
IF (HEADER (61 ) .NE.HEADER2(61) )  DIFF  =  DIFF+1 
IF(HEADER(62).NE.HEADER2(62>)  DIFF  =  DIFF+1 
IF(DIFF.EQ.O)  60  TO  150 

TYPE"THE  INPUT  FILE  AND  THE  PHONEME  FILE  ARE  NOT 
/  COMPATIBLE!" 

C 

C  Tell  user  about  problem  and  show  values 

C 

TYPE"Number  of  di f ference(s)  =  ",DIFF 
WRITE(10,56)  FILEN(I) 

56  FORMATC’VALUES  FOR  TPLATE  AND  ",S13,"  RESPECTIVELY") 
TYPE"Preemphisis(0=no,  1=yes)  =", 

/  HEADER (27) ,HEADER2(27) 

TYPE"Slope(db)  =",HEADER(28),HEADER2(28) 

TYPE"Corner  f requency(hz)  =",HEADER(29),HEADER2(29) 
TYPE"Deemphisis(0=no,  1=yes)  =", 

/  HEADER(60) ,HEADER2(60) 

TYPE"Slope(db)  =",HEADER(61),HEADER2(61) 

TYPE"Corner  f requency(hz)  =",HEADER(62),HEADER2(62) 
TYPE"Points  per  time  slice  =",HEADER(57) ,HEADER2(57) 
TYPE"Points  per  FFT  =",HEADER(30) ,HEADER2(30) 
TYPE”Window(0=rectangular,  1=haming)  =", 

/  HEADER (31 ) , HEADER (31 ) 

TYPE"0verlap(0=no  overlap,  1=overlap)  =", 

/  HEADER (58) ,HEADER2(58) 

120  TYPE"<15>INPUT  FILE  NOT  COMPATIBLE  WITH  PHONEME 
/  TEMPLATES 5>" 

C 

C  Give  user  option  on  mistake 

C 

TYPE">OPTIONS:H 

TYPE"  1  =  READ  IN  NEW  FILE" 

TYPE"  2  =  TERMINATE  PROGRAM" 

TYPE"  3  =  RE-INITIALIZE  TEMPLATE" 

ACCEPT"<15>  OPTION  =  ",I0P 
IF(IOP.EQ.I)  GO  TO  30 
IF(I0P.EQ.2)  GO  TO  40 
CALL  CL0SE(2,IER16) 

IF(IER16.NE.1 ) TYPE "ERROR  ON  CLOSE,  IER16=",IER16 
IF(I0P. EQ.3)  GO  TO  112 
GO  TO  120 
C 

C  Main  option  list(should  always  terminate  program  here) 

C 

150  TYPE"<15»MAIN  OPTIONS:" 

TYPE"  1  =  MAKE  PHONEME  TEMPLATE" 

TYPE"  2  =  READ  IN  NEW  FILE" 

TYPE"  3  =  CHANGE  MAX  NUMBER  OF  PHONEMES" 

TYPE"  4  =  TERMINATE  PROGRAM" 

ACCEPT"<15>  OPTION  =  ",IOPTION 
C 

IF(IOPTION.EQ.I)  GO  TO  25 
I F ( 10 PT ION . EQ . 2 )  GO  TO  30 
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n  n  o  o  oooo 


IFCI0PTI0N.EQ.3)  60  TO  160 
IF(I0PTI0N.EQ.4>  60  TO  40 
60  TO  150 
C 

160  WRITE(10,8)  MAXPH0N,HEADERC54) 

8  F0RMAT(1X,I3,"  IS  THE  MAX  NUMBER  OF  PHONEMES  IN  TEMPLATE  " 
/"WITH",I2,"  VECTORS  PER  PHONEME.") 

ACCEPT"ENTER,  NEW  MAXIMUM  =  ",MAXPH0N 
HEADER(56)  =  MAXPH0N*HEADER(54) 

60  TO  150 
C 

170  WRITE(10,9)  MAXPHON 

9  FORMATC'TRY  AGAIN,  Template's  max  phoneme  number  set  at  " 

M3,) 

25  WRITE(10,52) FILEN(I) 

TYPE"<15»0PTI0NS:  0  =  FORM  NEW  PHONEME" 

TYPE"  1  =  AVERA6E  IN  PHONEMES" 

ACCEPT"  2  =  return  to  main  options  >0PTI0N  =  ", 

/IOPTION 

IF(IOPTION.EQ.O)  60  TO  10 
IF ( IOPTION. EQ. 1 )  60  TO  20 
60  TO  150 
C 

20  WRITEC1 0,6)  HEADERC54) 

6  F0RMAT(/"AVERA6E  IN  PH0NEMES.",I3,"  VECTORS  PER  PHONEME.") 

IFLA6  -  1 

60  TO  15 
C 

10  WRITE(1 0,7)  HEADERC54) 

7  FORMAT(/"FORM  NEW  PH0NEME.",I3,"  VECTORS  PER  PHONEME.") 

IFLA6  =  0 


READ  IN  PHONEME 

15  ACCEPT"<15>  PHONEME  NUMBER  =  ",PHNUM 
IF CPHNUM. 6T. MAXPHON)  60  TO  170 
I PH  =  PHNUM-1 

STRBLK  =  INT ( ( (IPH*VLEN6TH) +VECBLK) / VECBLK) 

PHST  =  MOD(IPH,VECBLK)*INUM 

CALL  RDBLK(4,STRBLK,PHAR,BLKSRD, IER2) 

IFCIER2.NE.1 .AND. IER2.NE.9)TYPE"ERR0R  ON  RDBLK,  IER2=",IER2 


READ  IN  VECTOR 
60  TO  220 

230  TYPE"  TRY  A6AIN,  LAST  AVAILABLE  VECTOR  =",HEADER2(56) 
220  ACCEPT"ENTER,  FIRST  VECTOR  OF  PHONEME  =  ",STR 
60  TO  210 

200  TYPE"  TRY  A6AIN,  CONSECUTIVE  VECTORS  LEFT  =",VECLFT 
210  ACCEPT"ENTER,  TOTAL  CONSECUTIVE  PHONEMES  TO  AVERA6E  =  ", 
/NUMVEC 

VECLFT  =  HEADER2(56)-STR+1 


o  o  o  o 


IF(STR.GT. ( HEADER2( 56)- (HEADER ( 54) -1 ) ) )  GO  TO  230 
IF(NUMVEC*HEADER(54) .GT.VECLFT)  GO  TO  200 
DO  100  JX=1,NUMVEC 
STRB  =  STR+JX-2 

START  =  INT ( (STRB+VECBLKJ/VECBLK)  ;SET  BLOCK  FROM  -.OB 

VECST  =  MOD (STRB,VECBLK)*INUM  ;0,1 ,. . . ,LAST  VECTOR 
CALL  RDBLK(2,START,AAR,BLKSRD,IER4) 

IF(IER4.NE.1 )TYPE"ERROR  ON  RDBLK,  IER4=", IER4 


AVERAGE  VECTOR  WITH  PHONEME 

C  IPT,  IS  STORAGE  FOR  MOD  NUMBER  IN  TEMPLATE  HEADER 

IPT  =  PHNUM+63  ;CAN  STORE  193  PHONEMES 
C 

IF(JX.GT.I)  IFLAG=1 
IF(IFLAG.NE.O)  GO  TO  60 
HEADER(IPT)  =  0 
C 

60  DO  70  J=1 ,ISIZE  ; AVERAGE  PHONEMES 

PHAR(J+PHST)=((PHAR( J+PHST)*HEADER(IPT) ) 

/  +AAR( J+VECST) )/ ( HEADER ( IPT) +1 ) 

70  CONTINUE 

C 

HEADER(IPT)  =  HEADER(IPT)+1 
C 

WRITE(10,54)  PHNUM,HEADER(IPT) 

54  FORMAT ("PHONEME  ",13,",  HAS  BEEN  MODIFIED  ",I2,"  TIME(S).") 

C 
C 
C 

C  NORMALIZE  EACH  VECTOR 

C 

JOFF  =  1+PHST 
ICOMPS  =  INUM-1 
DO  90  IIN=1,VLENGTH 
SUME  =  0.0 
DO  80  J=1, ICOMPS 

80  SUME  =  SUME+(PHAR(J+J0FF)**2) 

ENERGY  =  SORT (SUME) 

DO  85  J=1, ICOMPS 

85  PHAR( J+JOFF)  =  (10000) *(PHAR(J+JOFF) ) /ENERGY 

JOFF  =  ( J0FF+IC0MPS)+1 
90  CONTINUE 

C 
C 
C 

CALL  WRBLK(4,STRBLK,PHAR,BLKSRD,IER7) 

I F( IER7.NE. 1 ) TYPE "ERROR  ON  WRBLK,  IER7=",IER7 
100  CONTINUE 

C 

GO  TO  25  ;return  to  mod  or  add  to  phonemes 

C 

40  CALL  STAT(TPLATE,STATUS,IER10) 
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APPENDIX  E 
Program  TOPS 

Program  TOPS  takes  distance  files  from  DRVR  and 
prepares  that  data  for  use  by  program  LEARN.  TOPS  output 
can  be  seen  in  Figure  E-l.  TOPS  also  decides  on  the 
beginning  and  end  point  of  speech  based  on  the  energy 
present  in  each  vector  of  speech  (8  ms  for  64  point  DFT  and 
8k  Hz  sample  on  original  speech).  In  addition,  T0P5  creates 
a  listing  of  the  top  phoneme  choice  for  each  vector  and  for 
use  in  resynthesis  of  speech. 
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TABLE  E-I 

Distance  File  Header 


CONTENTS 

Distance  file  name 
Observation  file  name 
Phonet  file  name 

Number  of  first  observation  time  slice  to  do 
Number  of  last  observation  time  slice  to  do 
Number  of  first  phonet  time  slice  to  do 
Humber  of  last  phonet  time  slice  to  do 
Number  of  disk  block  that  holds  first  observation 
Number  of  disk  block  that  holds  first  phonet 
Switch:  4=observation  and  phonet  files  identical 
not  used 

Number  of  observation  time  slices  to  do 
Number  of  phonet  time  slices  to  do 
Number  of  elements  per  time  slice 
Number  of  extended  memory  blocks  used 
^not  used 

Switch:  l=overlapping  0=non-overlapping 

not  used 


*-Added  to  subroutine  DSTN  of  program  DRVR,  the  value 
here  will  act  as  a  switch  for  program  T0P5. 

Distance  files  are  created  by  option  3of  DRVR. 


Figures  E-l ,  E-2#  E-3,  E-5,  and  E-7  are  examples  of 
output  from  program  T0P5  and  E-4,  E-6,  and  E-8  are  output 
from  CHOICES. 
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Figure  E-l .  Feature  Extraction  for  FOUR 
£  '  M2  (top)  and  Ml  (bottom)  Distance,  Single-Vector 
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Figure  E-2.  Feature 
M2  Distance,  Singl 


Extraction  for  ONE 
e-Vector  Template 
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Figure  E-3 .  Feature  Extraction  for  ONE 
Ml  Distance,  Single-Vector  Template 
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Figure  E-4*  Feature  Extraction  for  ONE 
Ml  Distance,  Five-Vector  Template 
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Figure  E-5.  Feature  Extraction  for  ONE  at  3Gs 
M2  Distance,  Single-Vector  Template 


THE  DATE  IS***  9  3  1982 

THE  TIKE  IS--  17  19  16 


VECTOR  FI  PET  SECOND  ThTFO  FOURTH  FIFTH 

NJf.BEP  CnOlCE  CHOICE  CHOICE  CHOICE  CHOICE 


12 

07 

100 

t£ 

SS 

01 

£6 

1 3 

77 

64 

74 

13 

0£ 

100 

07 

57 

64 

~7 

1 3 

76 

17 

70 

14 

0£ 

1  00 

07 

94 

64 

76 

1 7 

74 

22 

74 

15 

08 

100 

07 

96 

.1-7 

67 

64 

7S 

S£ 

7  £ 

16 

.  PE 

10C 

17 

95 

07 

.90 

..5* 

86 

3£ 

76 

17 

17 

100 

oe 

86 

38 

81 

54 

76 

05 

77 

16 

17 

100 

OS 

65 

54 

£1 

OB 

79 

16 

7B 

15 

17 

ICO 

16 

£7 

05 

£4 

32 

£3 

54 

81 

2C 

17 

ICC 

09 

57 

16 

56 

54 

£“ 

33 

£5 

21 

17 

100 

OS 

9£ 

16 

94 

54 

S4 

24 

SC 

22 

50 

100 

05 

54 

65 

S4 

24 

SO 

16 

es 

23 

50 

100 

09 

96 

69 

52 

66 

£4 

16 

ez 

24 

05 

100 

10 

SC 

53 

£4 

24 

fc 1 

65 

75 

25 

09 

ICO 

10 

67 

17 

78 

50 

72 

1 6 

71 

26 

OS 

:GC 

10 

7S 

i  7 

70 

47 

66 

2 1 

b  2 

27 

10 

ICC 

03 

94 

;  7 

£5 

ft  K 

77 

2 1 

75 

26 

10 

100 

OS 

90 

17 

£3 

55 

80 

47 

77 

29 

It 

100 

05 

77 

;  7 

76 

-4 

74 

4£ 

70 

50 

10 

IOC 

44 

95 

70 

£E 

14 

84 

17 

a; 

51 

7  0 

100 

44 

56 

U' 

54 

45 

£4 

4£ 

E5 

52 

45 

100 

46 

6: 

7C 

£1 

25 

7  5 

4-1 

7’ 

33 

45 

1  DO 

46 

SI 

4  1 

SO 

; 

£5 

70 

8  0 

54 

45 

ICC 

41 

EE 

i : 

£2 

46 

£: 

35 

65 

55 

45 

100 

46 

51 

41 

£7 

1 1 

e4 

SB 

79 

56 

4  £ 

100 

:  i 

S5 

-'5 

c  p 

4 : 

51 

54 

SC 

3"* 

16 

ICO 

41 

55 

:5 

55 

54 

1  1 

53 

5£ 

•  e 

1  GO 

03 

S3 

3  3 

t* 

^3 

‘  52 

31 

Si 

55 

35 

100 

03 

SS 

64 

95 

6  2 

S3 

31 

32 

40 

62 

100 

12 

£5 

6- 

5~ 

3  * 

b3 

\  5 

52 

41 

12 

100 

13 

$£ 

64 

56 

42 

S  4 

3: 

53 

42 

13 

100 

64 

52 

1 2 

6  3 

£2 

£2 

z  * 

76 

43 

15 

100 

64 

50 

c: 

£2 

1 2 

81 

36 

74 

SCALE 

FACTOR 


.6 3556940 
. 05076770 

•  56  067990 
.67935410 
. 737601  SO 
.71085040 
.73650070 
.76357330 
.84923450 
. 65253560 

•  66  437  1 1,0 
. 85662090 
i  87524  1*0 
.£6065860 
.84140060 
. 937631 50 
«  S43  D58* 0 
.  8813-410 
. £7763460 

1 . 00  D  0 0000 
.££35-530 
.50672270 
.75606800 
.££0 15450 
.54555600 
. 96245160 
.£6526810 
.£4073500 
, 827££2l 0 

•  75262150 
» 7C 1 £5930 
.76012230 


Figure  E-6.  Feature  Extraction  for  ONE  at  3Gs 
Ml  Distance,  Five-Vector  Template 
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Figure  E-7 .  Feature  Extraction  for  ONE  at  5Gs 
M2  Distance,  Single-Vector  Template 
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Figure  E-8.  Feature  Extraction  for  ONE  at  5Gs 
Ml  Distance,  Five-Vector  Template 
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PROGRAM: 

LANGUAGE: 

DATE: 

AUTHOR: 

SUBJECT: 

LAST  REVISION: 


TOP5  AND  ATOP5 
FORTRAN5 
18  SEP  82 
K.  BEACHY 

SPEECH,  LIST  TOP  PHONEMES 
15  OCT  82 


Regular  Interactive  operation. 

COMPILE:  FORTRAN/X  TOP5 

LOAD:  RLDR  TOP5  IOFT5  BFLIBB 


For  auto  execution  rename  source  file  to  AT0P5.FR  and 
compile  without  the  /X  option. 

COMPILE:  FORTRAN  AT0P5 

LOAD:  RLDR  AT0P5  IOFT5  3FLIB3 

TO  USE:  Name  input  file  "FILE3"  and  enter 

ATOP5  speakerCor  file)  word 
The  speaker  and  the  word  are  passed  to  the 
main  program  by  subroutine  I0FT5 
Example:  AT0P5  HCPO.SP  ENTER 


This  program  takes  blocks  of  data  prepared  by  program 
DRVR  (written  by  Martin,  see  his  Thesis  DEC  1982). 

The  information  in  the  data  blocks  is  put  in  a  form  to  be 
used  by  program  LEARN. 

The  first  block  is  a  header  block  of  pertinent  file 
information.  The  header  is  a  256  integer  array. 

The  data  is  in  the  remaining  blocks  and  arranged  as  follows: 
14  integer  elements  for  each  time  slice  of  original  speech. 

The  14  elements  are: 

1  Time  slice  number  of  file. 

2  Energy  of  slice(useful  since  each  time  slice  is  normalized. 

3  Phoneme  number  with  maximum  distance  to  time  slice 

4  Maximum  distance 

5  Phoneme  number  that  has  the  minimum  distance  to  time  slice 

6  Minimum  distance 

7  Phoneme  number  that  has  the  next  minimum  distance 

8  Next  minimum  distance 


13  Phoneme  number  with  the  5th  minimum  distance 

14  The  5th  minimum  distance 

The  data  is  stored  14  elements  followed  by  the  next  14 
for  all  the  data.  The  total  number  of  time  slices 
is  found  in  the  headerdst  Block)  HEADER(48) 

VARIABLES  AND  VALUES 

FILENM(13) - Holds  input  filename 

FILE3 - Is  the  auto  program  filename 
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HEADER(256) - Holds  header  information  from  file 


STATUS(18) - Receives  information  on  the  file  of 

interest. Thi s  program  uses  CALL  STATO  to  find  out 
how  many  Blocks  should  be  read  from  the 
input  file  FILENM  or  FILE3. 

ST0RE(5888) - Input  file  is  stored  in  this  array. 

This  array  can  handle  23  blocks  of  input  which  is 
equal  to  100  blocks  of  original  speech  (2.3sec). 

This  is  based  on  distance  choices  every  8  ms. 

SPEAKER(13) - When  available  the  speaker  is  read 

from  the  header. 

SAID(50) - Up  to  50  characters  may  be  specified  for 

what  the  speaker  said. 

MAXMIN - The  maximum  minimum  distance  (of  the  top  choice  distance) 

VALUE(5) - Holds  the  values  for  each  time  slice.  This 

value  is  the  relative  score  for  each  phoneme  for  that  time  slice 

SET is  the  offset  required  to  read  the  correct  value  from 

the  STORE  array 

SKIP - a  dummy  value  used  for  auto  execution  and  to  help 

avoid  compiler  errorC'next  statement  can  not  be  reached") 

START  PROGRAM 

INTEGER  FILENM(13),HEADER(256),STATUS(18),BSET 
INTEGER  STORE ( 5888) ,SPEAKER(1 3) , SAID (50) ,MAXMIN 
INTEGER  VALUE(5), SET, IDATE(3), SKIP 
INTEGER  THRESH,SETOT,SPEC,NUMPHX, NOISE 
REAL  SFACTOR 

Use  SKIP  for  interactive  use 


SKIP  =  1  ;set  for  interactive  program 
IF(SKIP.EQ.I)  GO  TO  5 

The  next  5  lines  are  for  auto  program  execution, 
(which  are  skipped  over  for  interactive  use) 

CALL  10  FT5 (2, Ml , SPEAKER, SAID, II ,M2,I2,I3,I4) 
CALL  STAT("FILE3", STATUS, IER) 

IF(IER.NE.I)  TYPE"ERR0R  ON  STAT,  IER=",IER 
CALL  0PENO  ,"FILE3",1  ,IER2) 

IF(IER2.NE.1 )  TYPE"ERR0R  ON  OPEN,  IER2=",IER2 

Ask  for  interactive  information 


X  X  o  o  o 


TYPE"ENTER  FILENAME  WHICH  CONTAINS  DISTANCE 
/  <15>  INFORMATION.  (FROM  DRVR  PROGRAM)" 

ACCEPT"<15>  FILENAME  =  " 

READC1 1 ,1 )  FILENM(I) 

F0RMAT(S13) 

ACCEPT"<15>  WORD  SPOKEN  =  " 

READd  1 ,2)  SAID(I) 

F0RMATCS50) 

Set  blocks  to  be  read  in 

CALL  STAT(FILENM,STATUS,IER) 

IF(IER.NE.I) TYPE "ERROR  ON  STAT,  IER=",IER 
BLOCKS  =  STATUS ( 9) +1 


CALL  OPEN(1,FILENM,1,IER2) 

IFCIER2.NE.1 ) TYPE "ERROR  ON  OPEN,  IER2=",IER2 

CALL  RDBLKd  ,0,  HEADER,  1 ,  IE  R3) 

IFCIER3.NE.1 )TYPE"ERROR  ON  RDBLK,  IER3=",IER3 

CALL  RDBLKd ,1  ,ST0RE,STATUS(9) ,IER4) 

IF(IER4.NE,1 ) TYPE "ERROR  ON  RDBLK,  IER4=",IER4 

CALL  CLOSEd ,IER5) 

IF(IER5.NE.1)TYPE"ERROR  ON  CLOSE,  IER5=",IER5 
Set  up  output  file  that  contain  all  vectors 
CALL  DFILW("OUT5",IER6) 

IF(IER6.NE.1 ) TYPE" ERROR  ON  DFILW,  IER6=",IER6 
OPEN  4,"0UT5" 

Set  file  with  no  noise,  to  give  to  recognition  PROGRAM 
CALL  DFILW ("OUT7",IER7) 

IFCIER7.NE.1 )TYPE"ERROR  ON  DFILW,  IER7=",IER7 
OPEN  5,"OUT7" 

OPEN  file  for  list  of  top  choice,  to  be  used  for 
speech  generation 

CALL  DFILW("OUTT",IERA) 

IFCIERA.NE.1 )TYPE"ERROR  ON  DFILW,  IERA=", IERA 
OPEN  3,"0UTT" 

CALL  FGTIME(IHOUR,IMIN,ISEC) 

CALL  DATE(IDATL, IER8) 

CALL  CHECKCIER8) 


DO  30  1=1,  13 


X  SPEAKER(I)  =  HEADER(I+13) 

X30  CONTINUE 

X  WRITE(10,3)  SPEAKERC1) 

X3  FORMAT("<15>  SPECTRAL  FILE:  ",S13) 

C 

C  send  header  to  proper  files 

C 

WRITE(4,230)  SAID(1) 

WRITE(5,235)  SAID  Cl ) 

230  FO  RMATd  5  X,  "SENTENCE  SPOKEN  :  ",550  /) 

235  F0RMAT(S50) 

C 

WRITE(4,240)  SPEAKER(I) 

WRITE(5,245)  SPEAKER(I) 

240  FORMATd  9X,"SPEAKER  WAS  :  ",S13  /) 

245  F0RMAT(S1 3) 

C 

WRITE(4,247)  IDATE 
WRITE(4,249)  IHOUR,IMIN,ISEC 
WRITE(5,247)  IDATE 
WRITE(5,249)  IHOUR,IMIN,ISEC 
247  FORMAT(15X,"THE  DATE  IS— ”213,15) 

249  FORMAT(15X,"THE  TIME  IS— ",313  //) 

C 

WRITE(4,250) 

WRITE(4,260) 

WRITE(4,270) 

WRITE(5,250) 

WRITE(5,260) 

WRITE(5,270) 

C 

250  F0RMAT(4X, "VECTOR", 4X,"FIRST",3X,"SEC0ND",4X,"THIRD", 

/  3X,"FOURTH",4X," FI FTH",7X,"SCALE",6X, "VECTOR") 

260  F0RMATC4X, "NUMBER", 3X,"CH0ICE",3X,"CH0ICE",3X,"CH0ICE", 

/  3X, "CH0ICE",3X,"CH0 I CE",6X,"FACT0R",6X, "ENERGY") 

270  FORM  AT  (4X,  "******",  2X,"*******",2X, ''*******",  2X,  "*******'', 

/  2X, "*******" f 2 X, ''*******", 5X, "*******'',  5X,  "*******"  / ) 

C 

c 

C  Find  the  maximum  of  the  minumum  distance 
C  Also  do  one  vector  less  than  the  total  number,  this  will 
C  eliminate  any  anomalies  caused  by  the  overlapping  window. 

C 

MAXMIN  =  1 

IDONE  =  HEADER(48)-1  ;list  all  but  last  vector 
C 

C  If  the  files  are  the  same  or  are  non-overlapping,  do  all 
C 

IF (HEADER (46) .EQ.4.0R.HEADER(58).EQ.1)  ID0NE=ID0NE+1 
DO  40  1=1, IDONE 
SET  =  (1-1 ) *14 

IF(MAXMIN.GT.ST0RE(SET+6) )  GO  TO  40 
MAXMIN  =  ST0RE(SET+6) 

40  CONTINUE 
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Initialize  first  loop 

IBEGIN  =  1 
IEND  =  IDONE 
BSET  =  0 

Set  threshold  for  64  or  32  components 
HEADER(50) rnumber  of  components  in  FFT 

IF (HEADER (50) .EQ.128)  THRESH=100 
IF(HEADER(50) .EQ.64)  THRESH=50 

FIRST  LOOP - this  loop  sends  data  to  0UT5  and  also 

set  the  beginning  and  end  vectors  for  0UT7  file, 
hopefully  this  will  eliminate  noise  present  before  and  after  word. 

DO  55  1=1 f IDONE 
SET  =  (1-1 )*14 

DO  50  J=1,5  ;5=NUMBER  OF  TOP  CHOICES 

VALUE(J)  =  INT(1 00*FL0AT(ST0RE(SET+4)-ST0RE(SET+6+((J-1 )*2 
/  FLOAT (ST0RE(SET+4) -STORE (SET+6) ) ) 

50  CONTINUE 

S  FACTOR  =  FL0AT(ST0RE(SET+6))/FL0AT(MAXMIN) 

Set  begin  and  end  vectors  for  0UT7.  Ignore  short  term 
peaks (less  than  5  vectors  above  THRESHOLD) 

C 

IF(BSET.EQ.I)  GO  TO  310 
I F (STORE ( SET+2) .GT. THRESH)  GO  TO  300 
COUNT  =  0 
GO  TO  350 
300  CONTINUE 

COUNT  =  COUNT+1 
IF(C0UNT.GT.4)  GO  TO  305 
GO  TO  350 

305  IBEGIN  =  1-4 
BSET  =  1 
GO  TO  350 
310  CONTINUE 

IF(ST0RE(SET+2).GT. THRESH)  GO  TO  320 
COUNT  =  0 
GO  TO  350 
320  CONTINUE 

COUNT  =  COUNT+1 
IF(C0UNT.GT.4)  GO  TO  330 
GO  TO  350 
330  CONTINUE 
IEND  =  I 
350  CONTINUE 
C 

C  Send  data  to  0UT5 

•:<  c 

WRITE(4,280)  I,ST0RE(SET+5) , VALUE (1 ) ,ST0RE(SET+7) , 
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/  VALUE(2),ST0RE(SET+9),VALUE(3),ST0RE(SET+11),VALUE(4), 

/  STORE ( SET+1 3 ), VALUE (5), S FACTOR, STORE (SET+2) 

55  CONTINUE 

The  next  section,  up  to  statement  75,  is  used  to  generate 
the  file  needed  for  program  LEARN  (recognition  algorithm) 

First  two  statements  used  to  adjust  beginning  or  end  as  desired 

IBEGIN  =  IBEGIN-0 
IEND  =  IEND+0 
IF(IBEGIN.LE.O)  IBEGIN  =  1 
IFCIEND.GT. IDONE)  IEND  =  IDONE 

SETOT  =  1  ;set  value  for  counter  used  to  make  OT  files 


DO  75  I=IBEGIN,  IEND 
SET  =  (1-1 )*14 
DO  70  J=1,  5 

VALUE(J)  =  INT (100*FL0AT( STORE ( SET+4 ) -ST0 RE( SET+6+( (J-1 )*2)))  / 
/  FLOAT (ST0RE( SET+4) -STORE (SET+6) ) ) 

70  CONTINUE 

SFACTOR  =  FLOAT (STORE (SET+6) )/FLOAT(MAXMIN) 

Set  all  noise  phonemes  to  1,or  to  noise  phoneme  number 

Set  SPEC,  NUMPHX,  and  NOISE  for  special  phoneme  templates. 

For  special  templates  use  proper  equations  between  73  and  72. 


SPEC  =  126  ;special  template 

NUMPHX  =  5  ;number  of  vectors  in  phoneme 

NOISE  =  2A  ;number  of  the  noise  phoneme 

DO  72  IX=1 ,5 
ISET  =  SET+3+(2*IX) 

For  special  templates 

IF(HEADER(43) .EQ.SPEC)  GO  TO  73 

IF(STORE(ISET) .GE.71 )  ST0RE(ISET)=1  ;for  71  +  noise  phonemes 
GO  TO  72 

Special  equations  to  reduce  template  number  into  proper  phonemes 
73  CONTINUE 

STORE(ISET)  =  INT ((STORE(ISET)-I ) /NUMPHX) +1 
IF(STORE(ISET).GT. NOISE)  STORE(ISET)=NOISE 
72  CONTINUE 

WRITE(5,280)  I,ST0RE(SET+5),VALUE(1),ST0RE(SET+7), 

/  VALUE ( 2) , STORE (SET+9) , VALUE (3) ,ST0RE(SET+11 ) , 

/  VALUE(A) ,ST0RE(SET+13) ,VALUE (5) ,SFACT0R,ST0RE(SET+2) 

Set  up  for  speech  syn. 
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WRITE  BINARY(3)  SET0T,ST0RE(SET+5) 
SETOT  =  SETOT+1 

CONTINUE 


format<5x,i4,3x,i3/ix,i3,2x,i3,ix,i3,2x,i3,ix 

I3,2X,I3,1X,I3,2X,I3/IX,I3,3X,F11.8,4X,I6> 


APPENDIX  F 


AUTOMATIC  PROGRAMS 


This  research  needed  to  use  as  many  automatic  programs 
as  possible  (programs  to  process  many  files  at  one  time 
automatically).  Programs  were  adjusted  to  run  automatically 
using  a  macrofile.  When  a  program  runs  automatically  it  is 
not  practical  to  enter  values  interactively.  I0FT5  can  pass 
information  to  programs  that  need  input  data.  The  programs 
listed  in  this  appendix  are  written  in  PASCAL,  and  based 
upon  a  program  developed  by  Montgomery  (Ref  G).  These 
programs  are  used  to  make  macrofiles  interactively.  The 
macrofiles  are  used  to  run  the  programs  SPENPLOT,  MKPHON, 
T0P5,  and  DRVR  (including  special  versions  of  DRVR).  Using 
the  macrofiles  enables  auto  operation  of  programs. 
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PROGRAM  AUTOTOP5  C INPUT,OUTPUT, AOUT  ); 


I: INTEGER; 

DIRECT0RY:STRINGC20D; 
TPLATE:STRINGC20]; 
FILENAME  :STRINGI120:; 
W0RD:STRINGC30] ; 
FLAGiBOOLEAN; 

AOUT  :TEXT; 


BEGIN(*PROGRAM  AUTOTOP5*) 

FLAG:=FALSE; 

REWRITE(AOUT  ); 

DIRECTORY:3'  '; 

WRITELNC INPUT  NAME  OF  DIRECTORY  WHERE  PROGRAM  WILL  RUN. '); 
WRITEC 'DIRECTORY  3  ’); 

READLN( DIRECTORY); 

TPLATE:3'  '; 

WRITELNC INPUT  NAME  OF  PHONEME  TEMPLATE'); 

WRITEC PHONEME  TEMPLATE  =  '); 

READLN(TPLATE); 

REPEAT 

FILENAME:3'  '; 

WRITELNC 'INPUT  NAME  OF  SPEECHFILE  TO  BE  PROCESSED.'); 

WRITEC 'FILENAME  =  '); 

READLNC FILENAME); 

WORD:3'  '; 

WRITELNC 'INPUT  WORDCS)  SPOKEN.'); 

WRITEC 'WORDCS)  3  '); 

READLNCWORD); 

WRITELN; 

WRITELNCAOUT  ,'DELETE/V  FILE1  '); 

WRITELNCAOUT  ,'MOVE/V  ', DIRECTORY, ’  ', FILENAME, ' /S  FILE1 '); 
WRITELNCAOUT  ,’DRVS'); 

WRITELNCAOUT  ,'DELETE/V  FILE1  '); 

WRITELNCAOUT  , 'MOVE/V' , DIRECTORY, •  ',TPLATE,'/S  FILE1 ' ); 
WRITELNCAOUT  ,'DRVD'); 

WRITELNCAOUT  ,'ATOP5','  ', FILENAME,'  ',WORD); 

I:=0; 

REPEAT 
I: =1+1; 

UNTIL  FILENAMECI]3’.'; 

I: =1+1; 

FILENAMECID :='0'; 

FILENAME Cl+1] :='7'; 

WRITELNCAOUT  ,' RENAME  0UT7  ', FILENAME); 

I:=0; 

REPEAT 
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I: =1+1; 

UNTIL  FILENAMECI3=' . * ; 

I:=I+1; 

FILENAME CI3 :=,0' ; 

FILENAME [I+13:='T'; 

WRITELN(AOUT  , ' RENAME  OUTT  *, FILENAME) 
UNTIL  FLAG; 

END(*PROGRAM  AUTOTOP5*). 


PROGRAM  AUTOSPEN  (INPUT,OUTPUT,AOUT  ); 


I: INTEGER; 

DIRECT0RY:STRINGC20D; 
FILENAME :STRINGC20D; 
W0RD:STRINGC20D; 
FLAG-.BOOLEAN; 

AOUT  :TEXT; 


BEGIN(*PROGRAM  AUTOSPEN*) 

FLAG:=FALSE; 

REWRITECAOUT  ); 

DIRECTORY^' 

WRITELNC INPUT  NAME  OF  DIRECTORY  WHERE  PROGRAM  WILL  RUN.') 
WRITE ( ' DIRECTORY  =  '); 

READLN( DIRECTORY); 

REPEAT 

FILENAME:^  '; 

WRITELNC INPUT  NAME  OF  SPEECHFILE  TO  BE  PROCESSED.'); 
WRITEC  FILENAME  =  '); 

READLNC FILENAME); 

WORD:='  '; 

WRITELNC  INPUT  WORD(S)  SPOKEN.'); 

WRITEC'WORD(S)  =  '); 

READLN(WORD); 

WRITELN; 

WRITELNCAOUT  , ' DELETE/V  F1LE1  '); 

WRITELNCAOUT  ,'MOVE/V  ' /DIRECTORY, '  ',FILENAME,'/S  FILE1 ') 
WRITELN(AOUT  ,'DRVS'); 

WRITELNC AOUT  , ' ASPENPLT' , '  '/FILENAME,'  '/WORD); 

I:=0; 

REPEAT 
I: =1+1; 

UNTIL  FILENAMECID='.'; 

I: =1+1; 

FILENAME CID :=' O' ; 

FILENAME CI+1D :='B'; 

WRITELNCAOUT  /’RENAME  FILE2  '/FILENAME); 

UNTIL  FLAG; 

END(*PROGRAM  AUTOSPEN*). 
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PROGRAM  AUTOSPEC  (INPUT, OUTPUT, AOUT  ); 


VAR 

I: INTEGER; 

DIRECT0RY:STRINGC203; 
FILENAME : STRING  C203 ; 
FLAG:BOOLEAN; 

AOUT  :TEXT; 


BEGIN(*PROGRAM  AUTOSPEC*) 

FLAG:=FALSE; 

REWRITE(AOUT  ); 

DIRECTORY:*' 

WRITELNC  INPUT  NAME  OF  DIRECTORY  WHERE  PROGRAM  WILL  RUN.'); 
WRITECDIRECTORY  =  '); 

READLN( DIRECTORY); 

REPEAT 

FILENAME :='  '; 

WRITELNC INPUT  NAME  OF  SPEECHFILE  TO  BE  PROCESSED.'); 
WRITEC' FILENAME  =  '); 

READLNC FILENAME); 

WRITELN; 

WRITELNC AOUT  ,' DELETE/ V  FILE1  '); 

WRITELN(AOUT  ,'MOVE/V  ', DIRECTORY, '  ',FILENAME,'/S  FILE1 ' ) 
WRITELNCAOUT  ,'DRVS'); 

I:=0; 

REPEAT 
I: =1+1; 

UNTIL  FILENAMECn  =  ’.'; 

I: =1+1; 

FILENAME CI3 :=' O' ; 

FILENAME CI+1 3 :='B'; 

WRITELN(AOUT  ,' RENAME  FILE2  ', FILENAME); 

UNTIL  FLAG; 

END(*PROGRAM  AUTOSPEC*). 
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PROGRAM  AUTODIST  (INPUT,OUTPUT/AOUT2  ); 


I: INTEGER; 

DIRECT0RY:STRINGC20D ; 
FILENAME: STRINGC20D; 
FLAG:BOOLEAN; 

AOUT2  :TEXT; 


BEGIN(*PROGRAM  AUTODIST*) 

FLAG:=FALSE; 

REWRITE(AOUT2  ); 

DIRECTORY^'  •; 

WRITELN( 'INPUT  NAME  OF  CURRENT  DIRECTORY.'); 

WRITE( 'DIRECTORY  =  '); 

READLN( DIRECTORY); 

REPEAT 

FILENAME:^  •; 

WRITELNC INPUT  NAME  OF  SPEECHFILE  TO  BE  PROCESSED.'); 

WRITEC  FILENAME  =  •); 

READLNC FILENAME); 

WRITELN; 

WRITELNCAOUT2  ,'DELETE/V  SPCHFILE  '); 

WRITELN (AOUT2  ,'MOVE/V  ' /DIRECTORY, '  FILENAME, '/S  SPCHFILE') 
WRITELN(AOUT2  ,'ADISTSP'); 

WRITELNC AOUT2  ,'ALIST5'); 

I:=0; 

REPEAT 
I: =1+1; 

UNTIL  FILENAMECID='.'; 

I:=I+1; 

FILENAMECID :='0'; 

FILENAMECI+1D :='2'; 

WRITELNC A0UT2  ,' RENAME  OUT2  FILENAME); 

I:=0; 

REPEAT 
I: =1+1; 

UNTIL  FILENAMECID=' . '; 

I: =1+1; 

FILENAMECID :=' O'; 

FILENAME CI+1 D :=' 3 '; 

WRITELNCAOUT2  , ' RENAME  0UT3  ^FILENAME); 

UNTIL  FLAG; 


ENDC*PROGRAM  AUTODIST*) 


APPENDIX  G 


OTHER  EXPERIMENTS 


Resynthesized  Speech 

An  experiment  was  made  using  Seelandt's  phoneme 
templates  to  resynthesize  normal  and  G-speech  with  a  mask. 
The  speech  files  were  digitized  using  Program  AUDIOHIST. 
The  digitized  speech  files  were  processed  by  TRYDIST5  and 
the  resultant  files  from  TRYDIST5  were  processed  by  LISTER4 
(Ref  1).  A  product  of  LISTER4  is  the  file  0UT3.  0UT3 
consists  of  the  top  choice  phoneme  sound  for  each  time  slice 
(8ms),  when  the  input  speech  file  is  compared  to  the  phoneme 
template.  0UT3  is  used  by  program  TALK  to  form 
resynthesized  speech  files  (Ref  1).  The  resynthesized 
speech  files  are  formed  from  digitized  speech  samples  of 
Seelandt's  phonemes.  The  resynthesized  speech  files  can  be 
heard  by  using  Program  AUDIOHIST. 

Forty-five  files  (HCP0.SP,  HCP1.SP,  to  HCPT.SP, 
H300.SP,  H301.SP,  to,  H30T.SP,  and  H500.SP,  H501.SP,  to, 

H50T.SP)  were  processed  for  resynthesis  and  the  same  files 
were  processed  by  program  LEARN  for  recognition  results. 
Steps  in  obtaining  recognition  results  from  LEARN: 

1.  Digitize  speech  (AUDIOHIST  analog  in  digitized 
speech  out) 

2.  Extract  features  (PHDIST  developed  from 
TRYDIST5,  compares  digitized  speech  to  phoneme 
template ) 

3.  Output  features  (CH0ICE5  developed  from 
LISTER4,  uses  ouputs  from  PHDIST  to  prepare 
for  input  to  LEARN  with  top  5  choices) 
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4.  Recognition  (LEARN  uses  phonme  representations 
and  fuzzy  variables  to  score  each  file  against 
vocabulary.  Input  file  from  CH0ICE5 ) 

Phoneme  representations,  listed  in  Table  G-II,  were 
chosen  from  the  control  files  (C)  and  used  by  program  LEARN. 
The  overall  fuzzy  variables  and  word  fuzzy  variables  were 
both  the  same  and  are  listed  in  Table  G-III. 

Three  people  were  asked  to  listen  to  the  resynthesized 
speech  and  try  to  recognize  the  utterances,  given  knowledge 
of  the  15  word  vocabulary.  The  results  of  these  listening 
tests  and  the  recognition  results  of  program  LEARN  are  in 
Table  G-I.  The  files  are  represented  by  C,  3,  and  5  which 
stand  for  control  (1G),  3Gs  and  5Gs  respectively. 


TABLE  G-I 

RECOGNITION  RESULTS 


PEOPLE  LISTENING  PROGRAM 

1ST  2ND  3RD  LEARN 

C  3  5  C  3  5  C  3  5  C35 

ZERO  -R-  -RR  ---  -  R  - 

ONE  RRR  RRR  RRR  R-R 

TWO  -  -  -  -  -  -  R-R 

THREE  RR-  RR-  RR-  R-R 

FOUR  -  -  -  ___  -  R  - 

FIVE  - -  ---  -  -  - 

SIX  -  -  -  -  RRR 

SEVEN  -  -  -  ---  R  -  — 

EIGHT  ---  ___  -R-  R-R 

NINE  -  -  -  -  RRR 

CCIP  R-R  R-R  - -  RRR 

ENTER  -R-  ---  RR-  RRR 

FREQUENCY  -  -  -  -  -  -  R-R 

STEP  - -  - - -  R  -  — 

THREAT  -  -  -  ---  -  ___ 

RECOGNITION  20%  20%  17.8%  60% 


R=CORRECTLY  RECOGNIZED 


TABLE  G-I I 


PHONEME  REPRESENTATION  USED 


PHONEME 


WORD 

REPRESENTATION 

ZERO 

5-39-10-38 

ONE 

13-14-15-67-70 

TWO 

36-69-22-17-23-10-33 

THREE 

22-5-47-28-13 

FOUR 

67-39-67-41 

FIVE 

36-52-39-47-48 

SIX 

36-47-28-1-36 

SEVEN 

19-36-39-17-70 

EIGHT 

29-30-1-67-40 

NINE 

70-17-29-57-70 

CCIP 

36-29-36-29-1-29-30 

ENTER 

14-57-70-17-47 

FREQUENCY 

64-47-69-17-70-28-30 

STEP 

19-3-36-28-17-39 

THREAT 

67-64-5-47-64-67 

TABLE  G-III 
FUZZY  VARIABLES 


STHR  =  1.0E+00 

INSE  =  1.5E+G 

DELE  =  1.0E+00 

DCNE  =  1.0E+00 

SFE  =  2.0E+00 

CHVE  =  4.0E+00 

STATE=  1.0E+00 
THR1E=  1.2E+00 
THR2E=  1.5E+00 


SUBE  = 

1.0E+00 

INSF  = 

1.0E+00 

DELF  = 

8.0E-01 

DCNF  = 

1.2E+00 

SFF  * 

2.0E+00 

CHVF  = 

2.5E-01 

STATF= 

1 . 2E+00 

THR1F= 

1.0E+00 

THR2F= 

1 .0EH30 

SUBF  =  1 . 0E+00 

DELG  =  1.5E-01 

DCNG  =  5.0E-01 

STATG=  0.0E+00 


Independent  Speaker 


The  main  body  of  this  thesis  used  speech  files  from  one 
subject.  This  section  used  speech  files  from  an  independent 
speaker,  and  the  phoneme  template  of  Figure  18  was  used  to 
extract  features.  The  phoneme  representation  used  in  this 
section  is  identical  to  that  used  on  page  41.  For  an  idea 
of  how  an  independent  speaker  would  affect  the  overall 
system.  Tables  G-IV  and  G— V  can  be  compared  to  Table  V  in 
the  main  body.  This  experiment  is  identical  to  the  one  in 
Table  V,  page  45  except  the  speaker  is  different. 

Ninety-Nine  files  were  used  with  the  prefixs  of  SCA, 
SCB,  SCC,  S30,  S3A,  S50,  and  S5A.  These  files  are  listed  in 
Appendix  A.  Files  for  this  experiment  were  processed  just 
like  the  files  whose  results  are  depicted  in  Table  V,  page 
45. 

No  training  in  Table  G-IV  means,  program  LEARN  was  set 
up  for  the  speaker  used  in  the  main  body  of  this  Thesis  and 
no  statistics  were  gathered  on  the  independent  speakers 
files.  Table  G-V  used  files  with  no  G-stress  to  train 
program  LEARN  for  the  independent  speaker.  Files  A,  B,  and 
C  are  without  G-stress,  while  files  3,  3A,  5,  and  5A  have 
various  levels  of  G-stress  (3=3Gz,  3A=3Gz  and  1.5Gy,  5=5Gz, 
5A=5Gz  and  1.5Gy). 


TABLE  G-IV 


RECOGNITION  SCORES  FOR  INDEPENDENT  SPEAKER 
NO  TRAINING 


Phoneme  Length: 

5  vector 

Distance 

Rule : 

Ml 

WORDS 

TO  BE 

RECOGNIZED 

NO 

FILES 

TRAINING 

SET 

RECOGNITION 

SET 

A 

B 

C 

3 

3  A 

5 

5A 

ZERO 

.67 

.71 

.  66 

.69 

.  73 

- 

- 

ONE 

.66 

.72 

.72 

.78 

.65* 

.  72* 

.  54* 

TWO 

.75 

.72 

.81 

.  76* 

.81 

.  72 

.65* 

THREE 

.63* 

.60* 

.64* 

.  66  * 

- 

.64* 

.  73 

FOUR 

.66 

.56* 

.65 

.65* 

- 

.63* 

.67* 

FIVE 

.  66 

.75 

.72 

.67 

.73 

.71 

.66* 

SIX 

.80 

.86 

.74 

.79 

.83 

.84 

.80 

SEVEN 

.58* 

.  74 

.74 

.79 

.68 

.76 

.  70 

EIGHT 

.74 

.75 

.  77 

.  77 

- 

.  74 

.  77 

NINE 

.67 

.  70 

.68 

.  73 

.71 

.74 

.69 

CCIP 

.68 

.67 

.  72 

.75 

- 

.68 

.68* 

ENTER 

.79 

.65* 

.71 

.81 

.78 

.51* 

.65* 

FREQUENCY 

.  66 

.69 

.63 

.69 

.66* 

.65* 

.68 

STEP 

.68* 

.70* 

.81 

.70 

.75 

.83 

.72 

THREAT 

.73 

.68 

.74 

.69 

.  70* 

.  70 

.  73 

Percent 

Correct 

80 

73 . 3 

93.3 

80 

63.6 

64 . 3 

57.1 

MEAN 

.702 

.729 

.730 

.  705 

.691 

STANDARD 

DEVIATION 

.  051 

.053 

.059 

.084 

.062 

*Word  missed 


G-5 


TABLE  G-V 


RECOGNITION  SCORES  FOR  INDEPENDENT  SPEAKER 
WITH  TRAINING 


Phoneme  Length: 

5  vector 

Distance 

Rule : 

Ml 

WORDS 

TO  BE 

RECOGNIZED 

FILES 

TRAINING 

SET 

RECOGNITIO 

SET 

A 

B 

C 

3 

3A 

5 

ZERO 

.87 

.90 

.72* 

.75 

.83 

- 

ONE 

.79 

.85 

.83 

.81 

.84 

.82 

TWO 

.77* 

.  78* 

.78* 

.74* 

.74* 

.73* 

THREE 

.79 

.79 

.78 

.78 

- 

.76 

FOUR 

.83 

.82 

.86 

.  77* 

- 

.75* 

FIVE 

.81 

.86 

.85 

.76 

.83 

.77 

SIX 

.82 

.85 

.85 

.80 

.78 

.78 

SEVEN 

.74 

.86 

.84 

.83 

.76 

.84 

EIGHT 

.82 

.84 

.84 

.77* 

- 

.71* 

NINE 

.85 

.85 

.83 

.82 

.80 

.78 

CCIP 

.77 

.77 

.79 

.76 

- 

.75 

ENTER 

.88 

.81 

.85 

.83 

.83 

.  66  * 

FREQUENCY 

.78 

.78 

•  80 

.74* 

.76 

.72* 

STEP 

.81 

.81 

.87 

.80 

.  79 

.87 

THREAT 

.73 

.73 

.75 

.  66  * 

.68* 

.67* 

Percent 

Correct 

93.3 

93.3 

86.7 

66.7 

81.8 

57.1 

MEAN 

.813 

.775 

.785 

.758 

STANDARD 

DEVIATION 

.043 

.044 

.049 

.060 

*Word  missed 


128-point  DFT  Recognition 

This  section  contains  the  results  of  recognition  work 
using  a  128-point  DFT  instead  of  a  64-point  DFT.  The  128- 
point  results  used  a  1-vector  template  to  extract  features 
from  the  following  speech  files:  HCP0.SP,  HCP1.SP,  to 

HCPT. SP  /  H300.SP,  H301.SP,  to,  H30T.SP,  and  H500.SP, 


H501.SP,  to,  H50T.SP.  The  results  of  recognition  using 
program  LEARN  can  be  found  in  Table  G-VI.  The  phoneme 
representations  used  in  program  LEARN  for  this  experiment 
are  listed  in  Table  G-VII.  Phoneme  representations  were 
chosen  from  the  characteristics  of  the  P  files  (15 
utterances  at  1G). 


TABLE  G-VI 


RECOGNITION  SCORES  FOR  128  POINT  DFT 


Phoneme  Length:  I  vector 
Distance  Rule:  M2 


WORDS 

TO  BE 

RECOGNIZED 

TRAINING 

SET 

FILES 

RECOGNITION 

SET 

A 

B 

P 

C 

3 

ZERO 

.83 

.86 

.80 

.83 

.70 

.61 

ONE 

.73* 

.84 

.83 

.80 

.70 

.61 

TWO 

.81 

.84 

.82 

.81 

.65* 

.71 

THREE 

.85 

.72* 

.86 

.74 

.  70* 

.5 

FOUR 

.81 

.85 

.75 

.80 

.67* 

.61 

FIVE 

.86 

.86 

.84 

.84 

.60* 

.6< 

SIX 

.81 

.83 

.79 

.79* 

.73 

.7: 

SEVEN 

.77 

.83 

.83 

.81 

.70* 

.6! 

EIGHT 

.88 

.88 

.92 

.89 

.88 

.8: 

NINE 

.81 

.81 

.78 

.80 

.64* 

.71 

CCIP 

.78 

.83 

.79 

.81 

.80 

.7: 

ENTER 

.83 

.79 

.84 

.77 

.80 

•  61 

FREQUENCY 

.83 

•  85 

.72 

.82 

.74 

.7! 

STEP 

.83 

.83 

.85 

.75* 

.77 

.7 

THREAT 

.83 

.79 

.83 

.82 

.78 

.71 

Percent 

Correct 

93.3 

93.3 

100 

86.7 

60 

41 

MEAN 

.820 

.805 

.724 

.  6< 

STANDARD 

DEVIATION 

.041 

.036 

.073 

.01 

*Word  missed 


TABLE  G-VII 


PHONEME  REPRESENTATION  USED  FOR  128-POINT  DFT 

PHONEME 


WORD 

REPRESENTATION 

ZERO 

2-6-7 

ONE 

8-10-12 

TWO 

15-17-7 

THREE 

18-19-21-22 

FOUR 

23-24-27 

FIVE 

28-29-31 

SIX 

32-1-36 

SEVEN 

37-39-42-13 

EIGHT 

43-1-44 

NINE 

13-45-47 

CCIP 

37-49-22 

ENTER 

54-1-56 

FREQUENCY 

28-49-1-22 

STEP 

65-1-67 

THREAT 

68-1-70 

a 
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